How-To Tutorials

08 Apr 2024

12 min read

Apple’s ReALM, Google DeepMind’s Gecko, X.ai's Grok 1.5, Salesforce AI’s Moira, Stability AI’s Stable Audio 2.0, TWIN-GPT, ChatGPT Instant usage

08 Apr 2024

Subscribe to our Data Pro newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,Welcome to DataPro#88 – Your portal to the innovations in Data Science & Machine Learning! 🚀 In this edition, you'll find: ⚙️ LLMs & GPTs Unleashed TWIN-GPT: Digital Twins for Clinical Trials. Apple’s ReALM: AI with contextual understanding. Stability AI’s Stable Audio 2.0: Audio synthesis revolution. Salesforce AI’s Moira: Enhancing customer engagement. Google DeepMind’s Gecko: Versatile Text Embeddings. X.ai's Grok 1.5: Enhanced reasoning and context. ✨ What's Fresh & Exciting Distribute LLMs with llamafile: 5 Simple Steps. Dockerized Python Environment: The Elegant Way. Knowledge Distillation: Clone Powerful LLMs. Sora’s Diffusion Transformer (DiT): A Deep Dive. Generative AI: Copyright Reckoning. OpenAI Agent: Function Calling Capabilities. ⚡ Industry Pulse AWS & Mistral AI: Democratizing generative AI. Amazon SageMaker: No-code to code-first ML. Google Cloud Next: Database success stories. Google’s SEEDS in Weather Forecasting: AI quantifies uncertainty. Microsoft’s LLMs in the Imaginarium: Tool Learning. OpenAI: Fine-tuning API and custom models. ChatGPT: Instant usage. Synthetic Voices: Challenges and Opportunities. 📚 Packt's Latest Gem MATLAB for Machine Learning - Second Edition, By Giuseppe Ciaburro. DataPro Newsletter is not just a publication; it’s a comprehensive toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copy and start transforming your data expertise today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share your Feedback!Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🔰 GitHub Finds: Any of These Repos in Your Toolbox?🛠️ UpstageAI/dataverse: Dataverse simplifies ETL pipelines in Python, providing a user-friendly solution for data processing and management, accessible to all. 🛠️ GAP-LAB-CUHK-SZ/gaustudio: GauStudio is a modular framework for 3D Gaussian Splatting, providing streamlined pipelines and tools for easier implementation and deployment. 🛠️ TencentARC/BrushNet: BrushNet is a text-guided image inpainting model that enhances pre-trained diffusion models, focusing on divided features and dense control. 🛠️ agiresearch/AIOS: AIOS embeds LLMs into OS, enhancing resource allocation, context switch, concurrent execution, tool service, access control, and toolkit availability for developers. 🛠️ jasonppy/VoiceCraft: VoiceCraft excels in speech editing and zero-shot text-to-speech, requiring only a few seconds of reference to clone or edit voices. 📚 Expert Insights from Packt CommunityMATLAB for Machine Learning - Second Edition, By Giuseppe Ciaburro.Anomaly Detection in MATLAB Throughout the life cycle of a physical system, the occurrence of failures or malfunctions poses a potential threat to its normal functioning. To safeguard against critical interruptions, it becomes imperative to implement an anomaly detection system within the facility. Termed as a fault diagnosis system, this mechanism is designed to identify potential malfunctions within the monitored system. The pursuit of fault detection stands as a pivotal and defining phase in maintenance interventions, demanding a systematic and deterministic approach to comprehensively analyze all conceivable causes that might have led to the malfunction. Anomaly detection overview Anomaly detection is a technique used in data analysis and ML to identify data points or patterns that deviate significantly from the expected or normal behavior within a dataset. Anomalies, also known as outliers, are data points that do not conform to most of the data and may indicate errors, fraud, unusual events, or other important information. Anomaly detection has various applications across different domains, such as cybersecurity, industrial quality control (QC), finance, healthcare, and more. We can start to get an overview of different types of anomalies to understand what is intended with this term, we will list some types of anomalies: Point anomalies: These are individual data points that are considered anomalies, such as a single fraudulent transaction in a credit card dataset. Contextual anomalies: These are anomalies that are context-dependent. A data point might not be an anomaly on its own but is unusual in a particular context or time, such as a sudden spike in web traffic during a holiday sale. Collective anomalies: These are anomalies that are identified by examining a group of data points collectively. These anomalies involve patterns or relationships between data points. There are several methods for addressing anomaly detection problems, ranging from simple statistical techniques to complex ML algorithms. The choice of method depends on the nature of the data and the specific problem you are trying to solve. Here, we are listing the most used ones: Statistical methods: Statistical techniques such as z-scores, percentiles, and boxplots can be used to identify anomalies based on deviations from the mean or median of the data distribution. ML: Supervised, unsupervised, and semi-supervised ML algorithms can be used for anomaly detection. Some popular methods include Isolation Forest, One-Class Support Vector Machine (One-Class SVM), autoencoders (AEs), and k-means clustering. Time series analysis: Specialized techniques are used for detecting anomalies in time series data, such as autoregressive (AR) models, exponential smoothing, and moving averages (MAs). Density estimation: Methods such as kernel density estimation (KDE) and Gaussian Mixture Models (GMMs) are used to estimate the probability density function of the data and identify anomalies as low-density regions. Deep learning (DL): Neural networks (NNs), especially deep AEs (DAEs) and recurrent NNs (RNNs), are used for anomaly detection in high-dimensional data or sequences. Ensemble methods: Combining multiple anomaly detection models can improve overall performance and robustness. In addressing anomaly detection problems, we have to face some challenges. For example, determining an appropriate threshold for defining anomalies can be challenging. Imbalanced datasets, where anomalies are rare, can make model training and evaluation tricky. Handling high-dimensional data and noisy datasets can also be challenging. Anomaly detection is a valuable tool for identifying rare but potentially important events or patterns in large datasets. The choice of method depends on the specific domain, data characteristics, and the nature of anomalies that need to be detected. Discover more insights from "MATLAB for Machine Learning - Second Edition" by Giuseppe Ciaburro. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today!Read Here!⚡ Tech Tidbits: Stay Wired to the Latest Industry Buzz! AWS ML Made Easy 🌀 AWS and Mistral AI commit to democratizing generative AI with a strengthened collaboration: The article discusses the growing use of generative AI applications across industries, facilitated by Amazon Bedrock. It highlights Mistral AI's Mistral Large model, now available on Amazon Bedrock, offering advanced language capabilities. This collaboration aims to provide customers with diverse model options to suit their specific business needs, promoting innovation in AI technology. 🌀 Seamlessly transition between no-code and code-first machine learning with Amazon SageMaker Canvas and Amazon SageMaker Studio: This post discusses Amazon SageMaker Studio, an integrated ML development environment, and SageMaker Canvas, a no-code ML tool, highlighting their features and integration for seamless collaboration between non-ML and ML experts. Google Research 🌀 Get inspired: Database success stories at Google Cloud Next. This blog post previews Google Cloud Next '24, focusing on customers using Google Cloud databases for transformative purposes. It highlights sessions featuring Nuro, Lightricks, Bayer, Yahoo!, and Statsig, showcasing their innovative use cases.🌀 Generative AI to quantify uncertainty in weather forecasting: Google is advancing weather forecasting with innovations like MetNet-3 and SEEDS, a generative AI model. SEEDS efficiently generates probabilistic ensembles, addressing the butterfly effect's uncertainty, and offers cost-effective solutions for extreme weather events. Microsoft Research🌀 LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error. This research enhances large language models' (LLMs) tool usage accuracy through simulated trial and error (STE), inspired by biological systems. STE improves learning by simulating tool use scenarios, interacting with tools, and leveraging short and long-term memory. Results show significant performance boosts over existing methods.OpenAI Updates🌀 Introducing improvements to the fine-tuning API and expanding our custom models program: This update discusses techniques to improve model performance, such as retrieval-augmented generation (RAG) and fine-tuning and introduces new API features for developers to control their fine-tuning jobs, enhancing model quality, reducing costs, and latency. 🌀 Start using ChatGPT instantly: This new initiative aims to make AI more accessible by allowing instant access to ChatGPT without the need to sign up. It targets those curious about AI's potential but hesitant to set up an account, offering a seamless experience for learning, creative inspiration, and answering questions. 🌀 Navigating the Challenges and Opportunities of Synthetic Voices: Voice Engine is a model by OpenAI that generates natural-sounding speech from text input and a short audio sample, closely resembling the original speaker. They're sharing insights from a small-scale preview, highlighting its potential for various applications like reading assistance and personalized responses in education. Email Forwarded? Join DataPro Here!🔍 From Bits to BERT: Keeping Up with LLMs & GPTs 🌀 TWIN-GPT: Digital Twins for Clinical Trials via LLM. The research explores virtual clinical trials' benefits in healthcare, emphasizing patient safety and cost reduction. Existing methods struggle with prediction accuracy due to limited data. TWIN-GPT, a proposed approach, uses large language models to create personalized digital twins, improving predictions and showcasing digital twins' potential in healthcare. 🌀 Apple’s ReALM: AI that can “See” to understand the context: ReALM (Reference Resolution As Language Modeling) addresses the challenge of context understanding, including non-conversational entities like on-screen elements. By leveraging Language Models (LLMs), it demonstrates significant improvements in reference resolution, even outperforming GPT-4, offering over 5% gains for on-screen references. 🌀 Stability AI’s Stable Audio 2.0: Stable Audio 2.0 introduces a groundbreaking AI-generated audio standard, offering high-quality, full tracks up to three minutes long at 44.1kHz stereo. It features audio-to-audio generation, honoring creator rights, and expands creative possibilities, available for free on the Stable Audio website. 🌀 Salesforce AI’s Moira: Moirai is a universal time series forecasting model designed to address diverse forecasting tasks across various domains, frequencies, and variables in a zero-shot manner. It tackles key challenges in forecasting and offers robust performance, making it valuable for IT operations, sales forecasting, and more. 🌀 Google DeepMind’s Gecko: Versatile Text Embeddings Distilled from LLMs. Gecko is a compact text embedding model that achieves strong retrieval performance by distilling knowledge from large language models (LLMs). Its two-step distillation process, generating synthetic paired data and refining data quality, outperforms larger models on the Massive Text Embedding Benchmark. Gecko with 256 dimensions outperforms all entries with 768 dimensions; Gecko with 768 dimensions competes with models 7x larger and 5x higher dimensional embeddings. 🌀 X.ai Unveils Grok 1.5: Enhanced Reasoning and Long Context Features. Grok-1.5, the latest version of x.ai's Grok model, offers improved reasoning and long context capabilities. It excels in coding and math tasks, scoring 50.6% on MATH and 90% on GSM8K benchmarks. Grok-1.5 can process long contexts up to 128K tokens and boasts robust infrastructure for large-scale training. Early testers and existing Grok users on the x.ai platform will soon have access to Grok-1.5, with further features expected to roll out gradually. ✨ On the Radar: Catch Up on What's Fresh🌀 Distribute and Run LLMs with llamafile in 5 Simple Steps: This blog introduces llamaFile, a framework that simplifies using large language models (LLMs) by providing a one-file executable that runs locally without installation. It explains how to use llamaFile with the LLaVa model, a 7-billion-parameter model quantized to 4 bits, for tasks like chat, image uploading, and question-answering. 🌀 Setting A Dockerized Python Environment — The Elegant Way. This blog post demonstrates a more elegant method for setting up a dockerized Python development environment using VScode and the Dev Containers extension. It provides step-by-step instructions and prerequisites, including Docker Desktop, a Docker Hub account, and VScode with the Dev Containers extension installed. The tutorial focuses on using the official Python image (`python:3.10`) and explains the Dev Containers extension's role in creating an isolated VScode session inside a docker container. 🌀 Clone the Abilities of Powerful LLMs into Small Local Models Using Knowledge Distillation: This post explores the use of specialized, smaller-scale language models for specific NLP tasks, such as grammatical error correction. It discusses the process of constructing tailored models through data annotation and fine-tuning, and the use of knowledge distillation to automate labeling. The post provides a workflow for distilling knowledge from a large language model to a smaller one, using prompts and APIs, and demonstrates this process in the context of building a grammatical error correction model. 🌀 Deep Dive into Sora’s Diffusion Transformer (DiT) by Hand: This blog introduces Sora, OpenAI's text-to-video model, explaining its unique approach combining diffusion transformer and transformer strength for video prediction. It explores key concepts like diffusion, dimension reduction, and noise addition, offering insights into how Sora converts text prompts into realistic videos. Ideal for AI enthusiasts and those interested in video generation technologies. 🌀 The Coming Copyright Reckoning for Generative AI: This blog explores the complexities of copyright law in America, particularly in the context of generative AI. It discusses key concepts like original works, fair use, and the implications of generative AI on copyright. It also delves into legal cases and future considerations, offering insights for data scientists and AI enthusiasts. 🌀 Create an Agent with OpenAI Function Calling Capabilities: This article explores the advancements and challenges in developing AI-powered applications in 2024. It discusses how AI streamlines app features for a better user experience and introduces OpenAI's Function Calling to simplify structured data extraction. The article also highlights the ongoing innovations and the future of AI applications. See you next time!

0
0
49070

article-image-bi-pro49-microsoft-fabric-lifecycle-management-data-factory-adds-cicd-to-fabric-data-pipelines-database-mirroring-aws-well-architected-data-analytics-lens

Merlyn Shelley

04 Apr 2024

11 min read

BI-Pro#49: Microsoft Fabric Lifecycle Management, Data Factory Adds CI/CD to Fabric Data Pipelines, Database Mirroring, AWS Well-Architected Data Analytics Lens

Merlyn Shelley

04 Apr 2024

11 min read

Subscribe to our BI Pro newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,Welcome to BI-Pro #49, your ultimate guide to data and BI insights! 🚀 ⏩ What's Inside? Python Simplified: Master data validation with Pydantic. Visualize Like a Pro: 30+ tools for stunning data visuals. R for Bioinformatics: Custom visuals for bio data. Interactive Data: JavaScript meets Handsontable. Seaborn Stories: Craft data tales with line plots. MetaGPT Insights: Next-gen data solutions unveiled. 🏭 Industry Scoop: Power BI’s Latest: March's must-know features. Fabric Innovations: Updates and new tools from Microsoft Fabric. AWS Well-Architected Data Analytics Lens: Analytics strategies for the real world. Google Cloud Savings: Cut costs on ETL workflows. Tableau Journeys: From student to BI analyst. 💎 Expert Takes: Deep Dive into Python Deep Learning: The latest from Packt. 👉 Community Buzz: Twitch Chat Analysis, Graph Networks, LLM Data Quality, and Ethical AI: Key conversations this week! Dive into the trends shaping data and BI today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."📣 And here's the twist – we're tuning into YOUR frequency! Inspired by a reader's request, we're launching a column just for you. Got a burning question or a topic you're itching to dive into? Drop your suggestions in our content box – because your journey of discovery is our blueprint.We appreciate your input and hope you enjoy the book!Share your thoughts and opinions here! Cheers,Merlyn ShelleyEditor-in-Chief, PacktThanks for reading Packt BI-Pro! Subscribe for free to receive new posts and support our work.Pledge your supportSign Up | Advertise | Archives🚀 GitHub's Most Sought-After Repos🌀 man-group/ArcticDB: ArcticDB is a high-performance DataFrame database designed for Python Data Science, with a Python-centric API for Pandas DataFrames. 🌀 gradio-app/gradio: Gradio is an open-source Python package for building demos or web apps for ML models or Python functions, with easy sharing via built-in features. 🌀 Sinaptik-AI/pandas-ai: PandasAI is a Python library using generative AI to explore, clean, and analyze data with natural language queries.🌀 OpenRefine/OpenRefine: OpenRefine is a powerful Java-based tool for loading, understanding, cleaning, reconciling, and augmenting data, accessible from a web browser. 🌀 Kanaries/pygwalker: PyGWalker simplifies Jupyter Notebook workflows by converting pandas dataframes into interactive user interfaces for data analysis and visualization. 🌀 cleanlab/cleanlab: cleanlab aids in data and label cleaning by identifying issues in ML datasets automatically, enabling better model training with real-world data.Email Forwarded? Join BI-Pro Here!🔮 Data Viz with Python Libraries 🌀 Pydantic Tutorial: Data Validation in Python Made Simple. This blog tutorial explains how to use Pydantic, a data validation and serialization library in Python, to validate and serialize data classes, offering support for custom validators and Python's type hints for field validation. 🌀 30+ Data Visualization Libraries, Frameworks and Apps, Mastering Data Presentation: Explore over 30 data visualization tools like Metabase, Gephi, and Grafana, offering a range of features to transform raw data into meaningful visualizations for better decision-making in industries like tech, healthcare, finance, and marketing. 🌀 Mastering Data Visualization in R for Bioinformatics: The article delves into data visualization in R for bioinformatics, stressing its role in understanding complex biological data, communicating findings, hypothesis generation, and decision-making. It also discusses Anscombe's Quartet, highlighting the importance of visualizing data before analysis and the limitations of summary statistics. 🌀 Integrating JavaScript charting libraries with Handsontable: The article guides developers on integrating Highcharts, Recharts, and Chart.js with Handsontable for data visualization. It explains the features of each library and provides demos for creating a stock portfolio with interactive charts. 🌀 Data Visualization with Seaborn Line Plot: The article introduces Seaborn, a Python library for data visualization, built on top of Matplotlib. It covers installation and demonstrates creating single line plots and customizing styles for better presentation of data. 🌀 MetaGPT’s Data Interpreter: SOTA Open Source LLM-based Data Solutions. MetaGPT introduces its Data Interpreter, a new agent for streamlined data interpretation and analysis. The Data Interpreter employs advanced techniques for real-time data adaptability, tool integration, and logical inconsistency identification, showcasing superior performance in machine learning tasks. ⚡Stay Informed with Industry HighlightsPower BI 🌀 Power BI March 2024 Feature Summary: The Power BI update introduces visual calculation editing, data model editing in the Power BI Service, and report subscription delivery to OneDrive SharePoint. A new Microsoft Fabric certification exam, DP-600, is also available, with free certification opportunities through the Fabric AI Skills Challenge. 🌀 Announcing the Public Preview of Database Mirroring in Microsoft Fabric: Mirroring, now in Public Preview, allows seamless integration of databases into Microsoft Fabric's OneLake, providing real-time insights without ETL. It simplifies data replication and warehousing, enabling easy data access and analysis across different sources, including data lakes and warehouses. 🌀 Get data with Power Query available in Power BI Report Builder (Preview): Power BI Report Builder now allows connecting to 100+ data sources like Snowflake, Databricks, and AWS Redshift. You can transform data using M-Query for paginated reports. Install the latest version and connect from the "Data" tab. Microsoft Fabric🌀 Microsoft Fabric March 2024 Update: This update brings new features like OneLake File Explorer, Autotune Query Tuning, and Test Framework for Power Query SDK in VS Code to Power BI, enhancing reporting, modeling, service, mobile, and developer experiences. 🌀 Data Factory Adds CI/CD to Fabric Data Pipelines: Fabric engineers with Azure Synapse Analytics and Azure Data Factory experience can now utilize Git integration and built-in Deployment Pipelines in Data Factory data pipelines in Fabric. This public preview offers source control, CI/CD features, and collaborative development environments, enhancing data analytics projects. 🌀 Microsoft Fabric Lifecycle Management – Getting started with Git Integration and Deployment Pipelines: Microsoft Fabric makes Lifecycle Management easy, enabling continuous releases through Git and Deployment Pipelines. Git allows reliable updates for supported items like Lakehouse, Notebooks, and Reports, while Deployment Pipelines clone content between stages like DEV, TEST, UAT, and PROD. AWS BI 🌀 Announcing the AWS Well-Architected Data Analytics Lens: The Data Analytics Lens helps assess and improve analytics platforms on AWS. It offers best practices, such as building ACID-compliant data lakes and leveraging Serverless for data pipelines, aligned with the AWS Well-Architected Framework's pillars for secure, efficient, and cost-effective solutions. 🌀 Improve healthcare services through patient 360: A zero-ETL approach to enable near real-time data analytics. The post discusses how healthcare providers can improve patient care by leveraging AWS services for real-time analytics and personalized healthcare, focusing on a zero-ETL approach to data integration.Google Cloud Dat🌀 Enrich streaming data in Bigtable with Dataflow: The post discusses the importance of event stream processing in data engineering and introduces Apache Beam's Enrichment transform, which simplifies the process of enriching streaming data with Bigtable, improving data context and enabling more meaningful analysis.🌀 Dataflow at-least-once vs. exactly-once streaming modes: The post compares exactly-once and at-least-once processing modes in Dataflow Streaming Engine for streaming jobs. It explains the trade-offs between the two modes and provides guidance on choosing the right mode based on use case requirements. Tableau🌀 Data is both art and science - My Tableau Story: Andy Cotgreave. The post highlights Andy Cotgreave's journey from a data analyst at Oxford to becoming a Senior Technical Evangelist at Tableau. It emphasizes the importance of community engagement, innovation, building a portfolio, and having fun in data visualization. 🌀 Student to BI Analyst, How Tableau Can Lead to a Successful Data Career: This blog discusses Karolina Grodzinska's data visualization journey, from discovering Tableau to winning Iron Viz: Student Edition and becoming a Business Intelligence Analyst at Schneider Electric. Karolina emphasizes the importance of an active Tableau Public profile in career development and shares tips for building a strong portfolio and networking with the Tableau Community. ✨ Expert Insights from Packt CommunityPython Deep Learning - Third Edition - By Ivan VasilevDeveloping NN models for edge devices with TF Lite TF Lite is a TF-derived set of tools that allows us to run models on mobile, embedded, and edge devices. Its versatility is part of TF’s appeal for industrial applications (as opposed to research applications, where PyTorch dominates).The key paradigm of TF Lite is that the models run on-device, contrary to client-server architecture, where the model is deployed on remote, more powerful, hardware. This organization has the following implications (both good and bad): Low-latency execution: The lack of server-round trip significantly reduces the model inference time and allows us to run real-time applications. Privacy: The user data never leaves the device. Internet connectivity: Internet connectivity is not required. Small model size: The devices have limited computational ability, hence the need for small and computationally efficient models. More specifically, TF Lite models are stored in the FlatBuffers (https://flatbuffers.dev/) special efficient portable format, identified by the .tflite file extension. Besides its small size, it allows us to access data directly without parsing/unpacking it first. TF Lite models support a subset of the TF Core operations and allow us to define custom ones: Low power consumption: The devices often run on battery. Divergent training and inference: NN training is a lot more computationally intensive compared to inference. Because of this, the model training runs on a different, more powerful, piece of hardware than the actual devices, where the models will run inference. In addition, TF Lite has the following key features: Multi-platform and multi-language support, including Android (Java), iOS (Objective-C and Swift) devices, web (JavaScript), and Python for all other environments. Google provides a TF Lite wrapper API called MediaPipe Solutions (https://developers.google.com/mediapipe, https://github.com/google/mediapipe/), which supersedes the previous TF Lite API. Optimized for performance. It has end-to-end solution pipelines. TF Lite is oriented toward practical applications, rather than research. Because of this, it includes different pipelines for common ML tasks such as image classification, object detection, text classification, and question answering among others. The computer vision pipelines use modified versions of EfficientNet or MobileNet, and the natural language processing pipelines use BERT-based models. So, how does TF Lite model development work? First, we’ll select a model in one of the following ways: An existing pre-trained .tflite model (https://tfhub.dev/s?deployment-format=lite). Use MediaPipe Model Maker (https://developers.google.com/mediapipe/solutions/model_maker) to apply feature engineering transfer learning on an existing .tflite model with a custom training dataset. Model Maker only works with Python. Convert a full-fledged TF model into .tflite format. Discover more insights from 'Python Deep Learning - Third Edition' by Ivan Vasilev. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today! Read Here💡 What's the Latest Scoop from the BI Community? 🌀 Real-Time Twitch Chat Sentiment Analysis with Apache Flink: This blog explores building a real-time sentiment analysis application for Twitch chat using Apache Flink. It covers setting up the project, reading Twitch chat messages, performing sentiment analysis, and concludes with a demo. 🌀 Entity Type Prediction with Relational Graph Convolutional Network (PyTorch): This post discusses a Python setup for predicting entity types on heterogeneous graphs using the Relational Graph Convolutional Network (R-GCN) and the RGCNConv module from PyTorch. It explains knowledge graphs, entity type prediction, and the R-GCN model. 🌀 Data Quality Error Detection powered by LLMs: This article explores automating the identification of data errors in tabular datasets using Large Language Models (LLMs). It discusses the Data Dirtiness Score, challenges in data cleaning, and the potential of LLMs in detecting data quality issues. 🌀 Building Ethical AI Starts with the Data Team — Here’s Why: This article discusses the ethical considerations of AI, focusing on model bias, AI usage, and data responsibility. It emphasizes the role of data teams in ensuring ethical AI and suggests steps for data teams to take towards a more ethical future. See you next time!

0
0
23293

Merlyn Shelley

02 Apr 2024

10 min read

Databricks' DBRX, Stability AI's Stable Code Instruct 3B, SambaNova's Samba CoE v0.2, FrugalGPT, Advanced RAG Patterns on Amazon SageMaker

Merlyn Shelley

02 Apr 2024

10 min read

Subscribe to our Data Pro newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,Welcome to DataPro#87 – Your Gateway to the Cutting-Edge of Data Science & Machine Learning! 🚀 Dive into this edition to explore: ⚙️ LLMs & GPTs Unleashed Samba CoE v0.2: SambaNova's Speedy AI Models Efficient Training of Language Models with OpenAI AI21's Revolutionary SSM-Transformer Model: Jamba Databricks' DBRX: The New Open LLM Benchmark Stable Code Instruct 3B: Stability AI's Latest Offering HyperLLaVA: Boosting Multimodal Language Models ✨ What's Fresh & Exciting FrugalGPT: Cutting LLM Operating Costs Building a Reliable AI Agent from Scratch with OpenAI Tool Calling Fine-Tuning Instruct Models over Raw Text Data Crafting an OpenAI-Compatible API ⚡ Industry Pulse: Deciphering Advanced RAG Patterns on Amazon SageMaker Unveil the Future with AutoBNN: Mastering Probabilistic Time Series Forecasting! Engaging with Microsoft Copilot (web): Learning from Interaction 📚 Packt's Latest Gem "Principles of Data Science - Third Edition" by Sinan Ozdemir DataPro Newsletter is not just a publication; it’s a comprehensive toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copy and start transforming your data expertise today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share your Feedback!Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🔰 GitHub Finds: Any of These Repos in Your Toolbox?🛠️ Zejun-Yang/AniPortrait: AniPortrait is a new framework for creating high-quality animations using audio input and a reference portrait image, with face reenactment capabilities.🛠️ agiresearch/AIOS: AIOS embeds large language models into operating systems, enabling smarter resource allocation, context switching, and concurrent agent execution, advancing AGI. 🛠️ lichao-sun/Mora: Mora is a multi-agent framework for video generation, enhancing OpenAI's Sora capabilities through collaborative visual agents for diverse tasks. 🛠️ jasonppy/VoiceCraft: VoiceCraft is a high-performing neural codec language model for speech editing and zero-shot text-to-speech, excelling with diverse real-world data. 🛠️ dvlab-research/MiniGemini: Mini-Gemini enhances LLMs (Large Language Models) from 2B to 34B, integrating image understanding, reasoning, and generation, inspired by LLaVA. 🛠️ Picsart-AI-Research/StreamingT2V: StreamingT2V is a technique for creating long videos with rich motion dynamics, ensuring temporal consistency and high image quality. 📚 Expert Insights from Packt Community"Principles of Data Science - Third Edition" by Sinan Ozdemir. The Five Steps of Data Science A question I’ve gotten at least once a month for the past decade is What’s the difference between data science and data analytics? One could argue that there is no difference between the two; others will argue that there are hundreds of differences! I believe that, regardless of how many differences there are between the two terms, the following applies: Data science follows a structured, step-by-step process that, when followed, preserves the integrity of the results and leads to a deeper understanding of the data and the environment the data comes from. As with any other scientific endeavor, this process must be adhered to, or else the analysis and the results are in danger of scrutiny. On a simpler level, following a strict process can make it much easier for any data scientist, hobbyist, or professional to obtain results faster than if they were exploring data with no clear vision. While these steps are a guiding lesson for amateur analysts, they also provide the foundation for all data scientists, even those in the highest levels of business and academia. Every data scientist recognizes the value of these steps and follows them in some way or another. Overview of the five steps The process of data science involves a series of steps that are essential for effectively extracting insights and knowledge from data. These steps are presented as follows: Asking an interesting question: The first step in any data science project is to identify a question or challenge that you want to address with your analysis. This involves finding a topic that is relevant, important, and that can be addressed with data. Obtaining the data: Once you have identified your question, the next step is to collect the data that you will need to answer it. This can involve sourcing data from a variety of sources, such as databases, online platforms, or through data scraping or data collection methods. Exploring the data: After you have collected your data, the next step is to explore it and get a better understanding of its characteristics and patterns. This might involve examining summary statistics, visualizing the data, or applying statistical or machine learning (ML) techniques to identify trends or relationships. Modeling the data: Once you have explored your data, the next step is to build models that can be used to make predictions or inform decision-making. This might involve applying ML algorithms, building statistical models, or using other techniques to find patterns in the data. Communicating and visualizing the results: Finally, it’s important to communicate your findings to others in a clear and effective way. This might involve creating reports, presentations, or visualizations that help to explain your results and their implications. By following these five essential steps, you can effectively use data science to solve real-world problems and extract valuable insights from data. It’s important to note that different data scientists may have different approaches to the data science process, and the steps outlined previously are just one way of organizing the process. Some data scientists might group the steps differently or include additional steps such as feature engineering or model evaluation. Despite these differences, most data scientists agree that the steps listed previously are essential to the data science process. Whether they are organized in this specific way or not, these steps are all crucial for effectively using data to solve problems and extract valuable insights. Let’s dive into these steps one by one.Discover more insights from "Principles of Data Science - Third Edition" by Sinan Ozdemir. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today! Read Here!⚡ Tech Tidbits: Stay Wired to the Latest Industry Buzz! AWS ML Made Easy 🌀 Advanced RAG patterns on Amazon SageMaker: This post discusses how customers across various industries are utilizing large language models (LLMs) like Mixtral-8x7B Instruct to build generative AI applications such as QnA chatbots and search engines. It highlights the challenges and solutions in improving the accuracy and performance of these applications, focusing on Retrieval Augmented Generation (RAG) patterns implemented with LangChain.Google Research 🌀 AutoBNN: Probabilistic time series forecasting with compositional bayesian neural networks. This research introduces AutoBNN, an open-source package for automated, interpretable time series forecasting using Bayesian neural networks (BNNs). It addresses limitations of traditional methods like Gaussian processes (GPs) and Structural Time Series by combining the interpretability of GPs with the scalability and flexibility of neural networks. AutoBNN automates model discovery, provides high-quality uncertainty estimates, and scales effectively for large datasets. Microsoft Research🌀 Learning from interaction with Microsoft Copilot (web): This research focuses on how AI systems like Bing and Microsoft Copilot learn and improve from user interactions, particularly through reinforcement learning from human feedback (RLHF). It also explores how Bing has evolved its search capabilities and how Copilot is changing user interactions to be more conversational and workflow oriented. The research introduces frameworks like TnT-LLM and SPUR to improve taxonomy generation and user satisfaction estimation in AI interactions. Email Forwarded? Join DataPro Here!🔍 From Bits to BERT: Keeping Up with LLMs & GPTs 🌀 Samba CoE v0.2 from SambaNova delivers accurate AI models at blazing speeds: This blog post highlights Samba's advancements in AI architecture, specifically focusing on the introduction of Samba-1, a CoE architecture for enterprise AI. It discusses the features and benefits of Samba-1, its performance benchmarks, and plans for future releases, emphasizing the role of RDUs in driving efficiency and speed in AI models. 🌀 OpenAI’s Efficient Training of Language Models to Fill in the Middle: OpenAI demonstrates that autoregressive language models can effectively learn to infill text by moving a span of text from the middle of a document to its end, without harming generative capability. They propose training models with this method by default and provide benchmarks and best practices. 🌀 Jamba: AI21's Groundbreaking SSM-Transformer Model. Jamba is a groundbreaking model that merges Mamba SSM with Transformer elements, offering a 256K context window and outperforming similar models. Released under Apache 2.0, it will be available in the NVIDIA API catalog. Jamba optimizes memory, throughput, and performance, delivering remarkable efficiency. 🌀 Databricks’ DBRX: A New State-of-the-Art Open LLM. Databricks introduces DBRX, an open LLM setting new benchmarks in language understanding, programming, and math. With a 256K context window, it outperforms GPT-3.5 and competes with Gemini 1.0 Pro. DBRX is 40% smaller than Grok-1, offering 2x faster inference than LLaMA2-70B. 🌀 Introducing Stable Code Instruct 3B — Stability AI: Stable Code Instruct 3B, built on Stable Code 3B, offers state-of-the-art performance in code completion and natural language interactions for programming tasks. It outperforms Codellama 7B Instruct and matches StarChat 15B, with a focus on popular languages like Python and Java. Available for commercial use with a Stability AI Membership, the model is accessible on Hugging Face. 🌀 HyperLLaVA: Enhancing Multimodal Language Models with Dynamic Visual and Language Experts. This blog explores the advancements in Multimodal Large Language Models (MLLMs) and introduces HyperLLaVA, a dynamic model that improves performance by adaptively tuning parameters for handling diverse multimodal tasks, surpassing existing benchmarks and opening new avenues for multimodal learning systems. ✨ On the Radar: Catch Up on What's Fresh🌀 FrugalGPT and Reducing LLM Operating Costs: The blog discusses the high cost of running Large Language Models (LLMs) and introduces the "FrugalGPT" framework, which reduces operating costs significantly while maintaining quality. It explains how different models cost different amounts and proposes using a cascade of LLMs to minimize costs while maximizing answer quality. 🌀 Leverage OpenAI Tool calling: Building a reliable AI Agent from Scratch. The blog discusses the future role of AI in everyday tasks, focusing on text creation, correction, and brainstorming. It highlights the importance of Retrieval-Augmented Generation (RAG) pipelines and aims to provide Large Language Models with better context to generate more valuable content. 🌀 Fine-tune an Instruct model over raw text data: The blog explores the challenges of integrating modern chatbots with large datasets, focusing on context window sizes and the use of Retrieval-Augmented Generation (RAG) techniques. It proposes a lighter approach to fine-tuning chatbots on smaller datasets, aiming to bridge the gap between the constraints of a 128K context window and the complexities of models fine-tuned on billions of tokens. The experiment involves fine-tuning a model on The Guardian's dataset and aims to provide reproducible instructions for cost-effective model training using accessible hardware. 🌀 How to build an OpenAI-compatible API: The blog discusses the dominance of OpenAI in the Gen AI market, and the reasons developers might choose alternative LLM providers. It explores implementing a Python FastAPI server compatible with the OpenAI API specs to wrap any LLM, aiming for flexibility and cost-effectiveness. See you next time!

0
0
22112

article-image-elevate-your-bi-dashboards-with-figma

Merlyn Shelley

28 Mar 2024

12 min read

Elevate Your BI Dashboards with Figma

Merlyn Shelley

28 Mar 2024

12 min read

Subscribe to our BI Pro newsletter for the latest insights. Don't miss out – sign up today!Partnering with Figma Want to take your BI dashboards to the next level? Figma is the way to go! It's all about ramping up the design, making things work better, and giving your Power BI projects a real boost. With Figma, you'll speed up your projects, get more creative, and see better performance. So, why not give your reports a makeover with Figma? It's where design and data come together to make a big impact! Here's what Figma offers: ✅ Figma Professional: An all-in-one tool for seamless team collaboration. ✅ FigJam: Enables real-time teamwork and brainstorming. ✅ FigJam AI: Integrates ChatGPT for smarter collaboration. Guess what? You also have the Power BI UI Kit from the Figma Community! Sign Up Now! 👋 Hello,Welcome to BI-Pro #48, your ultimate guide to data and BI insights! 🚀In this issue: 🔮 Python Data Viz Matplotlib Data Visualization Seaborn: Visualizing Data in Python Use pandas for CSV Data Visualization Guides on SQL, Python, Data Cleaning, and Analysis Build An AI App with Python in 10 Steps ⚡ Industry Highlights Power BI Hybrid Workforce Experience Report Lakeview Dashboards Overview Grouping and Binning in Power BI Desktop Dashboards in Operations Manager Microsoft Fabric Analyze Dataverse Tables Bridging Fabric Lakehouses AWS Big Data Multicloud Analytics with Amazon Athena Analyze Fastly CDN Logs with QuickSight Google Cloud Data Spark Procedures in BigQuery Gemini Pro 1.0 in BigQuery via Vertex AI ✨ Expert Insights from Packt Community Unlocking the Secrets of Prompt Engineering 💡 BI Community Scoop Creating Interactive Power BI Dashboards Using Report Templates in Power BI Desktop 10 Analytics Dashboard Examples for SaaS Future of Data Storytelling: Actionable Intelligence Power BI: Transforming Banking Data Power BI vs Tableau vs Qlik Sense | 2024 Winner Get ready to supercharge your skills with BI-Pro! 🌟 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."📣 And here's the twist – we're tuning into YOUR frequency! Inspired by a reader's request, we're launching a column just for you. Got a burning question or a topic you're itching to dive into? Drop your suggestions in our content box – because your journey of discovery is our blueprint.We appreciate your input and hope you enjoy the book!Share your thoughts and opinions here! Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🚀 GitHub's Most Sought-After Repos🌀 sdv-dev/SDV: The Synthetic Data Vault (SDV) is a Python library that creates tabular synthetic data by learning patterns from real data using machine learning algorithms. 🌀 hyperspy/hyperspy: HyperSpy is a Python library for analyzing multidimensional datasets, making it easy to apply analytical procedures and access tools. 🌀 hi-primus/optimus: Optimus is a Python library for loading, processing, plotting, and creating ML models that works with pandas, Dask, cuDF, dask-cuDF, Vaex, or Spark. It simplifies data processing and offers various functions for data quality, plotting, and cross-platform compatibility. 🌀 mingrammer/diagrams: Diagrams simplifies cloud system architecture design in Python, supporting major providers and tracking changes in version control. 🌀 kayak/pypika: PyPika simplifies building SQL queries in Python with a flexible, easy-to-use interface, leveraging the builder design pattern for clean, efficient queries. Email Forwarded? Join BI-Pro Here!Partnering with Webflow Transform your BI reporting with Webflow Enterprise. Create visually stunning, scalable websites without coding, using a visual canvas.Seamlessly integrate with popular BI platforms and let Webflow handle the code.Start building smarter, faster, and more reliable websites for your data-driven decisions today! Get Started for Free! 🔮 Data Viz with Python Libraries 🌀 Matplotlib Data Visualization in Python: This blog introduces Matplotlib, a Python library for 2D visualizations, covering its capabilities and plot types like line, scatter, bar, histograms, and pie charts. It highlights Matplotlib's versatility, customization, and integration with other libraries, making it essential for data science and research. 🌀 Visualizing Data in Python With Seaborn: This article introduces the seaborn library for statistical visualizations in Python. It covers creating various plots, such as bar, distribution, and relational plots, using seaborn's functional and objects interfaces. It emphasizes seaborn's clear and concise code for effective data visualization. 🌀 Use pandas to Visualize CSV Data in Python: This blog discusses using the CData Python Connector for CSV with pandas, Matplotlib, and SQLAlchemy to analyze and visualize live CSV data in Python. It highlights the ease of integration and superior performance of the connector, along with step-by-step instructions for connecting to CSV data, executing SQL queries, and visualizing the results in Python. 🌀 Collection of Guides on Mastering SQL, Python, Data Cleaning, Data Wrangling, and Exploratory Data Analysis: This guide is tailored for business intelligence professionals new to data science, offering step-by-step instructions on mastering SQL, Python, data cleaning, wrangling, and exploratory analysis. It emphasizes practical skills for extracting insights and showcases essential tools and techniques for effective data analysis. 🌀 Build An AI Application with Python in 10 Easy Steps: This blog outlines a 10-step guide to building and deploying AI applications with Python, covering objectives, data collection, model selection, training, evaluation, optimization, web app development, cloud deployment, and sharing the AI model, with practical advice for each step. ⚡Stay Informed with Industry HighlightsPower BI 🌀 Hybrid Workforce Experience Power BI report: This tutorial explains using the Power BI Hybrid Workforce Experience report to analyze the impact of hybrid work models on employees working onsite, remotely, or in a hybrid manner. It covers setup, key metrics analysis, and improving employee experience, with prerequisites outlined. 🌀 What are Lakeview dashboards? This article discusses Lakeview dashboards, designed for creating and sharing data visualizations within teams. It highlights their advanced features, comparison with Databricks SQL dashboards, and dataset optimizations for better performance, including handling various dataset sizes and query efficiency. 🌀 Use grouping and binning in Power BI Desktop: This article explains how to use grouping and binning in Power BI Desktop to refine data visualization. Grouping allows you to combine data points into larger categories for clearer analysis, while binning lets you define the size of data chunks for more meaningful visualization. The article provides step-by-step instructions for creating, editing, and applying groups and bins to numerical and time fields, enhancing the exploration of data and trends in visuals. 🌀 Dashboards in Operations Manager: This article covers dashboard templates and widgets in Operations Manager, outlining their layouts and functions. It highlights various dashboard types, such as Service Level, Summary, and Object State, each with specific widgets. Users can create, share, and view dashboards across different consoles. Microsoft Fabric🌀 Analyze Dataverse tables from Microsoft Fabric: The article announces new features for Dynamics 365 and Power Apps customers, allowing easy integration of insights into Fabric. Users can now create shortcuts to Dataverse environments in Fabric for quick data access and analysis across multiple environments, enhancing business insights. 🌀 Bridging Fabric Lakehouses: Delta Change Data Feed for Seamless ETL. This article explains using Delta Tables and the Delta Change Data Feed in Microsoft Fabric for efficient data synchronization across lakehouses. It highlights Delta Tables' features and demonstrates updating tables across Silver and Gold Lakehouses in a medallion architecture. AWS BI 🌀 Multicloud data lake analytics with Amazon Athena: This post discusses creating a unified query interface using Amazon Athena connectors to seamlessly query across multiple cloud data stores, simplifying analytics in organizations with data spread over different clouds. It also explores managing analytics costs using Athena workgroups and cost allocation tags. 🌀 How to Analyze Fastly Content Delivery Network Logs with Amazon QuickSight Powered by Generative BI? This post discusses using Fastly, a content delivery network (CDN), to enhance web performance and security. It highlights creating a dashboard with Amazon QuickSight for analyzing CDN logs, using AWS services like S3 and Glue for data storage and cataloging. Google Cloud Data 🌀 Apache Spark stored procedures in BigQuery are GA: BigQuery now supports Apache Spark stored procedures, enabling users to integrate Spark-based data processing with BigQuery's SQL capabilities. This simplifies using Spark within BigQuery, allowing seamless development, testing, and deployment of PySpark code, and installation of necessary packages in a unified environment. 🌀 Gemini Pro 1.0 available in BigQuery through Vertex AI: This post advocates for a unified platform to bridge data and AI teams, ensuring smooth workflows from data ingestion to ML training. It introduces BigQuery ML, enabling ML model creation, training, and execution in BigQuery using SQL. It supports various models, including Vertex AI-trained ones like PaLM 2 and Gemini Pro 1.0, and enables sharing trained models, promoting governed data usage and easy dataset discovery. Gemini Pro 1.0 integration into BigQuery via Vertex AI simplifies generative AI, enhancing collaboration, security, and governance in data workflows. ✨ Expert Insights from Packt CommunityUnlocking the Secrets of Prompt Engineering - By Gilbert Mizrahi Exploring LLM parameters LLMs such as OpenAI’s GPT-4 consist of several parameters that can be adjusted to control and fine-tune their behavior and performance. Understanding and manipulating these parameters can help users obtain more accurate, relevant, and contextually appropriate outputs. Some of the most important LLM parameters to consider are listed here: Model size: The size of an LLM typically refers to the number of neurons or parameters it has. Larger models can be more powerful and capable of generating more accurate and coherent responses. However, they might also require more computational resources and processing time. Users may need to balance the trade-off between model size and computational efficiency, depending on their specific requirements. Temperature: The temperature parameter controls the randomness of the output generated by the LLM. A higher temperature value (for example, 0.8) produces more diverse and creative responses, while a lower value (for example, 0.2) results in more focused and deterministic outputs. Adjusting the temperature can help users fine-tune the balance between creativity and consistency in the model’s responses. Top-k: The top-k parameter is another way to control the randomness and diversity of the LLM’s output. This parameter limits the model to consider only the top “k” most probable tokens for each step in generating the response. For example, if top-k is set to 5, the model will choose the next token from the five most likely options. By adjusting the top-k value, users can manage the trade-off between response diversity and coherence. A smaller top-k value generally results in more focused and deterministic outputs, while a larger top-k value allows for more diverse and creative responses. Max tokens: The max tokens parameter sets the maximum number of tokens (words or subwords) allowed in the generated output. By adjusting this parameter, users can control the length of the response provided by the LLM. Setting a lower max tokens value can help ensure concise answers, while a higher value allows for more detailed and elaborate responses. Prompt length: While not a direct parameter of the LLM, the length of the input prompt can influence the model’s performance. A longer, more detailed prompt can provide the LLM with more context and guidance, resulting in more accurate and relevant responses. However, users should be aware that very long prompts can consume a significant portion of the token limit, potentially truncating the model’s output. Discover more insights from 'Unlocking the Secrets of Prompt Engineering' by Gilbert Mizrahi. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today! Read Here💡 What's the Latest Scoop from the BI Community? 🌀 Creating Interactive Power BI Dashboards That Engage Your Audience: This blog discusses the challenges faced by stakeholders and clients unfamiliar with using dashboards, preferring traditional tools like Excel. It emphasizes the importance of creating user-friendly and interactive dashboards to bridge this gap, offering techniques to enhance engagement and accessibility.🌀 Create and use report templates in Power BI Desktop: This tutorial explains how to create and use report templates in Power BI Desktop, enabling users to streamline report creation and standardize layouts, data models, and queries. Templates, saved with the .PBIT extension, help jump-start and share report creation processes across an organization. 🌀 10 Analytics Dashboard Examples to Gain Data Insights for SaaS: This article discusses the importance of analytics dashboards in simplifying the tracking of SaaS metrics and extracting insights. It provides 10 examples of analytics dashboards, including web, digital marketing, and user behavior, and highlights the top 5 analytics tools. The article emphasizes the need for clear, customizable, and intuitive dashboards for effective decision-making. 🌀 The Future of Data Storytelling: Actionable Intelligence [AI, Power BI, and Office]: This blog post discusses Zebra BI's solutions for reporting, planning, and presenting, emphasizing the importance of clarity, consistency, and actionability in data visualization. It introduces the concept of a reporting-planning-presenting cycle and highlights upcoming features and innovations, including the integration of AI. The post also mentions Zebra BI's adherence to the IBCS standard for clear and consistent business communication. 🌀 Power BI: Transforming Banking Data. This blog post discusses how Power BI can help banks analyze complex data for better decision-making. It covers challenges in banking, how Power BI integrates data sources, develops dashboards, and optimizes analytics. Benefits include improved operations, customer experience, risk management, and cost savings. 🌀 Power BI vs Tableau vs Qlik Sense | Which Wins In 2024? This blog compares Power BI, Tableau, and Qlik Sense for business intelligence (BI) and analytics. It highlights Power BI's advantages in data management, Tableau's strong visualization capabilities, and Qlik Sense's modern self-service platform. The article concludes with a comparison of features and recommendations for different needs. See you next time!Affiliate Disclosure: This newsletter contains affiliate links. If you buy through them, we may earn a small commission at no extra cost to you. This supports our work and helps us keep providing useful content. We only recommend products and services we think will benefit our readers. Thanks for your support!

0
0
30729

Merlyn Shelley

26 Mar 2024

14 min read

Transforming Web Data with Browse AI

Merlyn Shelley

26 Mar 2024

14 min read

Subscribe to our Data Pro newsletter for the latest insights. Don't miss out – sign up today!Partnering with Browse AI Turn Web Data into Your Business Superpower!👉 Train a robot in 2 minutes, no coding needed. 🤖 👉 Ideal for web scraping and data monitoring. 🌐 Here’s what you get: Monitor Websites for Changes ✅ Download Data from Any Website ✅ Turn Any Website into an API ✅ Product data extraction ✅ Also, extract data from news, stocks, jobs, social media, and more. Check out this 1-minute explainer video on how to extract data to Excel, Airtable, and connect to 5,000+ apps using Zapier! Start for free with up to 50 credits, and for a limited time, enjoy free setup and onboarding for Team and Company plans, saving up to 20% on Annual plans. Get Scraping Today!👋 Hello,Welcome to DataPro#85 – Your one-stop shop for the latest in Data Science and ML Algorithms! 🚀 In this issue:⚙️ Keeping Up with LLMs & GPTs Meet Devin: The pioneering AI software engineer. Google's Croissant: A fresh take on metadata for ML-ready datasets. INSTRUCTIR by Kaist AI: Setting new standards in instruction-following for information retrieval models. Spyx by Sussex AI: Turbocharging spiking neural networks with just-in-time compiled optimization. SynCode by VMware: Enhancing LLM code generation with a touch of grammar. Chatbot Arena: The ultimate battleground for evaluating LLMs by human preference. Apollo: Bringing medical AI to the masses with a multilingual medical LLM. ✨ On the RadarTop AI tools for code generation in 2024. Setting up a Pypi mirror in AWS with Terraform. Ensuring safer code changes with custom pre-commit hooks. Deciphering the AQLM Quantization Algorithm. AI's role in revolutionizing web browsing. Tackling tensors through three tricky errors. Running RStudio inside a container. Harnessing PyTorch and MLX for Apple Silicon. 🏭 Industry Highlights Google Research: Boosting LLMs with Cappy, evolving tables with Chain-of-table, and Scalable Instructable Multiworld Agent (SIMA). AWS: Streamlining code review with generative AI using Amazon Bedrock. OpenAI Updates: Leadership continuity and global news partnerships. 📚 New in Packt Library Practical Guide to Applied Conformal Prediction in Python by Valery Manokhin. DataPro Newsletter is not just a publication; it’s a comprehensive toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copy and start transforming your data expertise today! 📥 Feedback on the Weekly EditionTake our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share your Feedback!Cheers,Merlyn ShelleyEditor-in-Chief, PacktSign Up | Advertise | Archives🔰 GitHub Finds: Any of These Repos in Your Toolbox?🛠️ deepseek-ai/DeepSeek-VL: Open-source Vision-Language (VL) model for real-world tasks, handling logical diagrams, web pages, formulas, scientific literature, and more. 🛠️ OpenGVLab/VideoMamba: VideoMamba enhances 3D CNNs and video transformers, excelling in long-term video understanding with scalability and modality compatibility. 🛠️ showlab/DragAnything: DragAnything uses entity representation for motion control in video generation, offering user-friendly interaction and outperforming existing methods. 🛠️ pkunlp-icler/FastV: FastV accelerates large vision language models by pruning redundant visual tokens, achieving 45% FLOPs reduction without performance loss. 🛠️ cnulab/RealNet: RealNet introduces SDAS for anomaly strength control, AFS for feature selection, and RRS for anomaly region identification. Partnering with SurfsharkSurfshark is allowing our readers to enjoy a full 2 years of their award-winning VPN protection for 79% off, plus 2 months free. With Surfshark One, you get: Unlimited devices and connections ✅ One account for the entire household ✅ Your online activity, made safe, secure, and invisible ✅ Plus, identity protection, ad blocking, antivirus, and data breach monitoring.Claim your VPN protection today! 📚 Expert Insights from Packt CommunityPractical Guide to Applied Conformal Prediction in Python - By Valery Manokhin Basic components of a conformal predictor We will now look at the basic components of a conformal predictor: Nonconformity measure: The nonconformity measure is a function that evaluates how much a new data point differs from the existing data points. It compares the new observation to either the entire dataset (in the full transductive version of conformal prediction) or the calibration set (in the most popular variant – ICP. The selection of the nonconformity measure is based on a particular machine learning task, such as classification, regression, or time series forecasting, as well as the underlying model. This will examine several nonconformity measures suitable for classification and regression tasks. Calibration set: The calibration set is a portion of the dataset used to calculate nonconformity scores for the known data points. These scores are a reference for establishing prediction intervals or regions for new test data points. The calibration set should be a representative sample of the entire data distribution and is typically randomly selected. The calibration set should contain a sufficient number of data points (at least 500). If the dataset is small and insufficient to reserve enough data for the calibration set, the user should consider other variants of conformal prediction – including TCP (see, for example, Mastering Classical Transductive Conformal Prediction in Action – https://medium.com/@valeman/how-to-use-full-transductive-conformal-prediction-7ed54dc6b72b). Test set: The test set contains new data points for generating predictions. For every data point in the test set, the conformal prediction model calculates a nonconformity score using the nonconformity measure and compares it to the scores from the calibration set. Using this comparison, the conformal predictor generates a prediction region that includes the target value with a user-defined confidence level. All these components work in tandem to create a conformal prediction framework that facilitates valid and efficient uncertainty quantification in a wide range of machine learning tasks. Discover more insights from 'Practical Guide to Applied Conformal Prediction in Python' by Valery Manokhin. Unlock access to the full book and a wealth of other titles with a 7-day free trial in the Packt Library. Start exploring today! Read Here!⚡ Tech Tidbits: Stay Wired to the Latest Industry Buzz! AWS ML Made Easy 🌀 Enhance code review and approval efficiency with generative AI using Amazon Bedrock: This post discusses the challenges faced by managers in overseeing code review and approval processes in software development, such as lack of technical expertise, time constraints, volume of change requests, manual effort, and the need for documentation. It also introduces a solution that leverages generative artificial intelligence and integrates it with AWS deployment tools to streamline the review and approval process. The solution includes automated change analysis, summarization, and an approval workflow. Google Research 🌀 Cappy: Outperforming and boosting large multi-task language models with a small scorer. This blog discusses advancements in large language models (LLMs) and their use in natural language processing (NLP). It introduces the concept of multi-task LLMs, such as T0, FLAN, and OPT-IML, which excel at understanding and solving various tasks. It also presents a new approach called Cappy, a lightweight pre-trained scorer that enhances the performance and efficiency of multi-task LLMs. 🌀 Chain-of-table: Evolving tables in the reasoning chain for table understanding. This research focuses on improving how large language models (LLMs) reason over tabular data, which is challenging due to the structured nature of tables. The proposed framework, Chain-of-Table, trains LLMs to iteratively update tables, mimicking human reasoning, resulting in improved performance on table understanding tasks. 🌀 Talk like a graph: Encoding graphs for large language models. This research explores how to teach large language models (LLMs) to reason with graph information, crucial for understanding interconnected data. They introduce GraphQA, a benchmark to evaluate LLMs on graph problems, revealing insights into effective graph encoding methods and improving LLM performance on graph tasks by up to 60%. 🌀 Scalable Instructable Multiworld Agent (SIMA): A generalist AI agent for 3D virtual environments. Google DeepMind has developed SIMA, a versatile AI agent trained on multiple video games to follow natural-language instructions, akin to human behavior. Collaborating with game studios, SIMA navigates various environments, showcasing potential for AI to understand and execute diverse tasks. OpenAI Updates 🌀 Review completed & Altman, Brockman to continue to lead OpenAI: The OpenAI Board completed a review by WilmerHale, expressing full confidence in Sam Altman and Greg Brockman's leadership. They also elected new board members and adopted governance enhancements. WilmerHale's review found a breakdown in trust between the prior Board and Mr. Altman, leading to his removal, but concluded that his conduct did not mandate removal. Following the review, the Board endorsed the decision to rehire Mr. Altman and Mr. Brockman. 🌀 Global news partnerships: Le Monde and Prisa Media: OpenAI has partnered with Le Monde and Prisa Media to bring French and Spanish news content to ChatGPT. This partnership aims to enhance user interaction with news content and contribute to the training of OpenAI's models. Through these partnerships, users will access summaries and links to original articles, expanding their news consumption experience. This collaboration supports the news industry and its role in providing reliable information globally. Email Forwarded? Join DataPro Here!🔍 From Bits to BERT: Keeping Up with LLMs & GPTs 🌀 Introducing Devin, the first AI software engineer: Meet Devin, the autonomous AI software engineer, skilled in long-term reasoning and planning. Devin can learn new technologies, build and deploy apps, find and fix bugs, train AI models, and contribute to open source. Devin excels in resolving real-world GitHub issues, outperforming previous models. Cognition, the AI lab behind Devin, aims to unlock new possibilities beyond coding. 🌀 Google’s Croissant: a metadata format for ML-ready datasets. Croissant is a new metadata format for ML datasets, aiming to simplify the use of existing datasets for training ML models. It standardizes dataset descriptions and organization, supporting responsible AI practices. Croissant builds upon schema.org and is supported by major tools and repositories like Kaggle, Hugging Face, and OpenML. It includes a specification, example datasets, a Python library, and a visual editor to facilitate dataset usage and publication. 🌀 Kaist AI’s INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models. This research focuses on enhancing search accuracy by improving retrievers to understand users' intentions, similar to language models. It introduces INSTRUCTIR, a benchmark for evaluating retrievers' ability to follow user-aligned instructions in retrieval tasks. The study addresses limitations in existing benchmarks and highlights potential overfitting issues in instruction-aware retrieval datasets. 🌀 Sussex AI’s Spyx: A Library for Just-In-Time Compiled Optimization of Spiking Neural Networks. Advancements in large neural architectures have led to powerful AI accelerators for training deep neural networks. However, these networks often incur high costs. Neuromorphic computing with Spiking Neural Networks (SNNs) offers energy-efficient alternatives, but training SNNs is challenging. Spyx, a new lightweight SNN simulation and optimization library designed in JAX, aims to facilitate SNN architecture investigation by bridging Python-based deep learning frameworks with custom compute kernels, achieving optimal hardware utilization. 🌀 VMware’s SynCode: Improving LLM Code Generation with Grammar Augmentation. SynCode is a novel framework for efficient syntactical decoding of code with large language models (LLMs). It leverages grammar of a programming language using an offline-constructed efficient lookup table called Deterministic Finite Automaton (DFA) mask store. SynCode seamlessly integrates with any context-free grammar (CFG) defined language, reducing syntax errors by 96.07% when combined with LLMs. 🌀 Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference. Chatbot Arena is an open platform designed to evaluate Large Language Models (LLMs) by considering human preferences. Utilizing a pairwise comparison method and crowdsourced input, it assesses LLMs' alignment with user preferences. The platform, operational for months with over 240K votes, provides a credible and valuable resource for ranking LLMs. Check out the tool here. 🌀 Apollo: A Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People. The project aims to develop medical Large Language Models (LLMs) in the six most spoken languages, benefiting 6.1 billion people. This includes creating the ApolloCorpora multilingual medical dataset and the XMedBench benchmark, with Apollo models achieving top performance among models of similar sizes. The project will open-source training data, code, model weights, and evaluation benchmarks. You can check for the demo here. ✨ On the Radar: Catch Up on What's Fresh🌀 Top Artificial Intelligence (AI) Tools That Can Generate Code To Help Programmers (2024): The article discusses how AI is changing programming, with tools like OpenAI Codex and GitHub Copilot generating code. It explores AI's impact on code quality and development speed, showcasing various AI-powered tools like Tabnine, CodeT5, and Polycoder. Additionally, it mentions AI tools for code review, static code analysis, and AI-assisted coding in IDEs like PyCharm and Visual Studio. 🌀 Pypi mirror in a private AWS environment Terraform: This article explains how to install Python packages in an AWS Sagemaker Studio environment without internet access. It covers setting up Sagemaker in VPC Only mode, using VPC Endpoint interfaces for network communications, and accessing the Pypi package repository through AWS Codeartifact, which allows defining Pypi as an upstream repository. 🌀 Custom pre-commit hooks for safer code changes: This blog post explains the importance of using pre-commit hooks in software development, particularly with the git version control system. It discusses the challenges of maintaining coding standards in collaborative projects and provides a step-by-step tutorial on how to set up and use custom pre-commit hooks for a Python project, using the example of validating dataflow definitions for the Hamilton library. 🌀 AQLM Quantization Algorithm, explained: A new quantization algorithm, AQLM (Additive Quantization of Language Models), was recently released and integrated into HuggingFace Transformers and HuggingFace PEFT. AQLM sets a new state-of-the-art for 2-bit quantization while providing improvements for 3-bit and 4-bit ranges, pushing the boundaries of model accuracy and memory footprint. 🌀 Revolutionize Web Browsing with AI: This article explores creating an AI agent using the gpt-4-vision-preview model from OpenAI, enabling it to navigate the web like a human. It discusses the agent's browser control, content browsing, and decision-making processes, showcasing potential use cases such as aiding visually challenged users and automating web browsing tasks. 🌀 Understanding Tensors: Learning a Data Structure Through 3 Pesky Errors. This article discusses transitioning from managing tabular data to working with tensors in TensorFlow, offering debugging tips and code recipes. It covers visualizing TensorFlow datasets, understanding tensor specs, and augmenting model summaries, while addressing common errors related to tensor rank and shape. 🌀 Running RStudio Inside a Container: This tutorial focuses on setting up RStudio using Docker, particularly leveraging the Rocker RStudio image. It covers pulling the image, launching RStudio in a container, and ensuring persistence of data by using volume mapping. The tutorial provides step-by-step instructions and explanations for each stage. 🌀 PyTorch and MLX for Apple Silicon: The blog discusses Apple's MLX framework, which is optimized for Apple Silicon and serves as a bridge between PyTorch, NumPy, and Jax. It details a comparison between MLX and PyTorch through a custom convolutional neural network implementation for image classification tasks. The discussion includes insights into MLX's features, such as its array class, lazy computation, and compilation for performance optimization. The post also highlights the ease of converting PyTorch code to MLX, despite some differences in API compatibility and coding conventions. See you next time!Affiliate Disclosure: This newsletter contains affiliate links. If you buy through them, we may earn a small commission at no extra cost to you. This supports our work and helps us keep providing useful content. We only recommend products and services we think will benefit our readers. Thanks for your support!

0
0
33695

article-image-chatgpt-for-data-governance

Jyoti Pathak

22 Mar 2024

11 min read

ChatGPT for Data Governance

Jyoti Pathak

22 Mar 2024

11 min read

0
0
64268

article-image-ai-distilled-39-unpacking-mistral-large-googles-gemini-challenges-and-copilot-enterprise

Kartikey Pandey

21 Mar 2024

9 min read

AI_Distilled #39: Unpacking Mistral Large, Google's Gemini Challenges, and Copilot Enterprise

Kartikey Pandey

21 Mar 2024

9 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!Print to Pixel: Optimize your learning experience with PacktSeveral research studies have proven that printed books enhance comprehension, with the tactile experience of flipping pages and annotating the margins adding depth to the learning experience. However, developers can't overlook the practical benefits of eBooks, such as quickly finding relevant information or carrying an entire library on a single device.Acknowledging the unique benefits of both formats, Packt is offering a 40% discount on all print books, plus a free eBook version of each purchase, from February 26th to February 29th.Here’s what’s included:A Vast Library: Enjoy 40% off on over 5,000 titles spanning topics from Cybersecurity to Generative AI.Complimentary eBook: Each print book purchase includes a free eBook.AI Assistant: Top 500 books come with a personalized AI that can simply complex topics to your learning style, offering an interactive learning experience.Start Building Your Tech Library Today!👋 Hello,“No Al is perfect, especially at this emerging stage of the industry’s development, but we know the bar is high for us and we will keep at it for however long it takes.”-Sundar Pichai, Google CEOPichai acknowledges problems with Gemini AI, stressing the importance of unbiased information for users, and outlining steps to address issues and improve products. A rapidly progressing industry, AI development is a tricky game to master, with numerous pitfalls along the way.Greetings readers! Our mission is to help you stay on top of the ever-changing AI landscape so you can advance your skills. Let’s get started with the latest news and developments across the AI field:Microsoft provides new LLM Mistral Large on Azure with Mistral AIGoogle accepts some responses from their Gemini were unacceptable and biasedGitHub has launched Copilot Enterprise coding assistant integrating throughout the software development processResearchers developed new optimized language models called MobileLLM for mobile devices with under a billion parametersResearchers at Microsoft have developed new techniques to improve visual language modelsWe’ve also got you your fresh dose of GPT and LLM secret knowledge and tutorials:Mastering the Art of Prompt CraftingBreaking Down How Large Language Models LearnUsing AI to Level Up Live GamesMonitoring Large Language Models on AWSLast but not least, don’t miss out on the hands-on strategies and tips straight from the AI community for you to use on your own projects:Fine-Tuning Models for Speech Recognition Made SimpleMake Conversation Come Alive - Deploying Your Own AI Chat PartnerCombining Geospatial and Semantic Data to Build Powerful Search ToolsLeveraging Notion, Supabase and AI for Knowledge RetrievalWriter’s Credit: Special shout-out to Vidhu Jain for her valuable contribution to this week’s issue.Cheers, Kartikey Pandey Editor-in-Chief, Packt Unleash Your Data Potential with Packt's Latest Titles and Platform Enhancements! In a world that's always changing, learning is key to success. At Packt, we've updated our learning platform to help you stay ahead in the fast-moving tech world. Our platform makes learning easier and more effective, helping you overcome challenges and achieve your goals. Boost Your Data Skills with Packt's DataPro Library: On-Demand Learning: Access a wide range of books, video courses, research papers, and articles to help you grow. AI Assistance: Get help from AI to understand complex concepts easily, all within the same learning environment.Personalized Dashboard: Enjoy a tailored learning experience with recommendations and insights just for you. Advanced Self-Assessment: Use the latest tools to identify what you need to learn and track your progress accurately. Vibrant Community: Join a community of data and AI enthusiasts on Discord for collaboration and knowledge sharing. Exclusive Access: Be part of the DataPro beta program for a chance to win Amazon gift cards and early access to new features. Value for Money: Get all these benefits for just $7.99 per month, a small investment for big gains in your careerEnhance Your Data Skills Today⚡ TechWave: AI/GPT News & AnalysisMicrosoft has partnered with Mistral AI to provide their new LLM Mistral Large on Azure cloud services. This state-of-the-art AI model offers advanced NLP capabilities. Several companies have praised Mistral Large's performance in increasing productivity and aiding innovation.Google's CEO recently said some responses from their AI model Gemini were unacceptable and biased. The company has been working to address these issues and sees improvements but will review what happened. They plan to relaunch Gemini in the coming weeks after fixing it.GitHub has launched Copilot Enterprise, an AI coding assistant that integrates throughout the software development process. It provides customized code suggestions based on an organization's codebase, answers questions about internal systems, and generates summaries of code changes. Early testing found massive productivity gains from such AI tools.Researchers have developed new optimized language models for mobile devices with under a billion parameters. Called MobileLLM, the models achieve higher accuracy than previous smaller models through innovative architecture and weight-sharing techniques. MobileLLM shows significant gains on conversation tasks and competes with much larger models for common on-device uses.Researchers at Microsoft have developed new techniques to improve visual language models using structured knowledge graphs. By incorporating relationship maps between image elements like objects and attributes, models can generate richer images from text descriptions. Hierarchical prompting and dual-path encoding methods were also introduced to help models better understand complex language.🌟 Secret Knowledge: AI/LLM Resources🌀 Mastering the Art of Prompt Crafting: Got a new NLP project that needs prompting? This guide covers the basics of effective prompt engineering for AI models like ChatGPT. Learn how clarity, conciseness, and context can improve responses. Also explore techniques like zero-shot learning and dynamic few shots, plus how temperature, top-p, and other settings can refine your model's "personality". From system messages to tailoring examples, these tips will help you leverage your LLMs' full potential.🌀 Breaking Down How Large Language Models Learn: This article provides a helpful breakdown of how LLMs are trained through causal language modeling and calculates loss. It visually explains how models generate text sequences, are pre-trained to predict the next token, and how cross-entropy loss compares predictions to true labels to update weights. The process is demonstrated through code showing how loss is manually calculated for an LLM matching the framework's automatic calculation. This gives developers valuable insights into how state-of-the-art models learn.🌀 Using AI to Level Up Live Games: This article discusses how generative AI can enhance live service games. Techniques like adaptive gameplay, personalized ads, and faster asset creation are described. The authors provide a framework for developing games using tools like Unity, GKE, and Vertex AI. They demonstrate how ML models can dynamically generate images, code and dialogue to customize the player experience. Whether deploying models on GKE or Vertex, cloud-based AI brings the benefits of lower costs and easier maintenance than self-hosted options. 🌀 Monitoring Large Language Models on AWS: As AI language models grow more advanced, ensuring they behave properly becomes more important. This article discusses techniques for monitoring LLMs deployed on AWS. Key metrics covered include semantic similarity of responses, sentiment analysis, refusal rates, and more. The proposed architecture takes in model outputs, runs metrics modules, and reports results to CloudWatch for aggregation and alerts. With the right monitoring in place, you can help keep your conversational AI acting as intended.🔛 Masterclass: AI/LLM Tutorials🌀 Fine-Tuning Models for Speech Recognition Made Simple: This article discusses how to fine-tune LLMs for automatic speech recognition tasks using Amazon SageMaker. It explains language models and ASR as well as the basic steps for fine-tuning a pre-trained model which includes preparing data, choosing a model, training, evaluating, and deploying. SageMaker is highlighted as a powerful yet easy-to-use platform for this process due to its scalability, integration with AWS services, and pay-as-you-go pricing.🌀 Make Conversation Come Alive - Deploying Your Own AI Chat Partner: Tired of boring chatbots? This guide shows you how to bring the amazing Qwen AI model to your own server so you can have engaging discussions on any topic. The steps cover setting up your environment, installing dependencies, initializing the tokenizer and model, and using history to keep conversations flowing naturally. Once complete, you'll have a powerful AI assistant right at your fingertips. Best of all, it's completely open source.🌀 Combining Geospatial and Semantic Data to Build Powerful Search Tools: This guide shows developers how to create an interactive campground search map using vector databases, NLP models, and geospatial data. Technologies like Qdrant, Llama2, and Streamlit allow embedding text and locations to enable semantic queries. The page explains setting up Qdrant cloud, loading campground CSV data, and parsing text into nodes. Developers can then embed nodes with HuggingFace and query the vector store to retrieve similar results. By leveraging tools that understand both spatial and semantic context, you can build customized applications to help users explore outdoor destinations.🌀 Leveraging Notion, Supabase, and AI for Knowledge Retrieval: This tutorial shows how you can build a knowledge base by extracting data from Notion databases and storing it in a vector format in Supabase. It then demonstrates retrieving relevant information from the knowledge base using an AI model from OpenAI. By combining these tools, developers can query custom datasets and generate responses based on retrieved documents. The process involves loading Notion documents, storing embeddings in Supabase, and setting up a retrieval pipeline. With some enhancements, this could be a powerful way to access organizational information.🚀 HackHub: Trending AI Tools🌀 lucky-lance/expert_sparsity: Implements efficient expert pruning and dynamic skipping techniques for mixture-of-experts large language models to improve their efficiency and speed while maintaining strong performance.🌀 facebookresearch/pearl: This open-source library provides a modular reinforcement learning framework for building and training production-ready AI agents, empowering developers with state-of-the-art techniques.🌀 zhen-tan-dmml/llm4annotation: Curates papers on using LLMs for data annotation, which developers could reference to apply these techniques or learn about the current state of the art.🌀 google/gemma.cpp: Provides a lightweight C++ library for running Google's Gemma models that developers can easily integrate into their own projects for experimenting with and deploying LLMs.

0
0
56021

article-image-get-started-with-fabric-create-your-workspace-reports

Arshad Ali, Bradley Schacht

14 Mar 2024

9 min read

Get Started with Fabric: Create Your Workspace & Reports

Arshad Ali, Bradley Schacht

14 Mar 2024

9 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, Learn Microsoft Fabric, by Arshad Ali, Bradley Schacht. Harness the power of Microsoft Fabric to develop data analytics solutions for various use cases guided by step-by-step instructionsIntroductionEmbark on a journey to harness the full potential of Microsoft Fabric within Power BI. This article serves as your comprehensive guide, walking you through the essential steps to create your first Fabric workspace seamlessly. From understanding the fundamentals to practical implementation, we'll equip you with the knowledge and tools needed to optimize your data management and reporting processes. Get ready to elevate your Power BI experience and unlock new possibilities with Fabric-enabled workspaces.Creating your first Fabric-enabled workspaceOnce you have confirmed that Fabric is enabled in your tenant and you have access to it, the next step is to create your Fabric workspace. You can think of a Fabric workspace as a logical container that will contain items such as lakehouses, warehouses, notebooks, and pipelines. Follow these steps to create your first Fabric workspace:1. Sign into Power BI (https://app.powerbi.com/).2. Select Workspaces | + New workspace:Figure 2.5 – Creating a new workspace3. Fill out the Create a workspace form, as follows:Name: Enter Learn Microsoft Fabric and some characters for uniqueness.Description: Optionally, enter a description for the workspace:Figure 2.6 – Create a workspace – detailsAdvanced: Select Fabric capacity under License mode and then choose a capacity you have access to. If not, you can start a trial license, as described earlier, and use it here.4. Select Apply. Th e workspace will be created and opened.5. You can click on Workspaces again and then search for your workspace by typing its name in the search box. You can also pin the selected workspace so that it always appears at the top:Figure 2.7 – Searching for a workspace6. Clicking on the name of the workspace will open that workspace. A link to it will become available in the left-hand side navigation bar, allowing you to switch from one item to another quickly. Since we haven’t created anything yet, there is nothing here. You can click on +New to start creating Fabric items:Figure 2.8 – Switching to a workspaceWith a Microsoft Fabric workspace set up, let’s review the different workloads that are available.Copilot in Power BIPower BI has several key components, including data transformation and data modeling, culminating in a visual report that end users will consume. The Copilot experience is centered around the visual storytelling and reporting aspects of Power BI. This materializes in three ways: report page creation, narrative generation, and improving Q&A.Let’s look at each of these Copilot capabilities.Creating reports with the Power BI CopilotThe most common use for Copilot with Power BI is likely to be for creating reports. There are two features that come together to build reports. The first analyzes the dataset to suggest content for your report by using table relationships and column names, while the second one helps you create intuitive reports quickly. Figure 11.30 shows an example where Copilot has suggested several report pages, each with a short description of what would be displayed:Figure 11.30 – The Power BI Copilot page suggestionsIf you like the page suggestions, simply click on the Create button and the report page will appear.While a suggested set of report content is a good starting point, analysts often have a specific need to meet. You can have Copilot create a report from the criteria you provide using prompts as well. These can be as simple as “create a page that shows customer analysis” or more specific, such as “create a page to show the impact of each sales territory on profit and quantity sold.”Figure 11.31 – Sales impact report created by CopilotOnce the report page is generated, Copilot cannot update the report, but you can interact with and modify the report as necessary. This is a great way to reduce the time to get started building reports.A couple of other important things to note are that in addition to not being able to modify reports, Copilot will not allow you to specify specific visual types, apply filters, or change the report layout. All of these can be changed manually after the initial report generation. It is worth noting that users should not expect Copilot to filter results to a specific time period based on their prompt as an example.Next, let’s look at the smart narrative.Creating a narrative using CopilotVisuals are a wonderful way to tell a story and give users the ability to explore data on their own. However, sometimes a narrative that summarizes what is being displayed in a report can be useful. It can not only tell a story but also provide some additional context and information for users.To get started, open a report and add a narrative visualization to the report as shown in Figure 11.32. You will see two options; click on Copilot. Choose the type of summary you wish to produce and optionally select specific pages or visuals to include in the summary. Then click on Create.Figure 11.32 – The report narrative generated by CopilotAfter the narrative is generated, remember to always review the narrative for accuracy and adjust the prompt, if necessary, to produce more accurate results. In addition to summaries, you can ask it to highlight key information, customize the order in which the data is described to help convey importance, specify specific data points to include in the summary, and even generate impact analysis showing how different factors affect metrics on the report.Report, page, and visual narratives are a great way to guide users through a report, especially if there isn’t a subject matter expert there to explain all the data.Finally, let’s look at using Copilot to improve the Q&A visual.Generating synonyms with CopilotThe Q&A visual has been dazzling users for years at this point. It is impressive to build a model, walk into the room, and tell users that they can use natural language to query their data without needing to build any visuals. This may not be as impressive as the Copilot functionality that we have today, but it is still a very useful tool in your Power BI visualization toolbelt.One piece of important information for the success of Q&A is something called a synonym. These are end-user-specific ways to reference data. For example, a table in the data model may be called Dim Person, but you know that some report consumers always refer to these as “users.” Therefore, you would create a synonym that tells Q&A that when someone asks about users, they are really talking about persons. This can also be done on a column level. A synonym for “postal code” could be “zip code,” while a synonym for an “item” could be “product” or “finished good.”Q&A itself may not use Copilot, but Power BI Desktop can leverage Copilot to generate synonyms. This can be done when creating a new Q&A visual by clicking on Add synonyms from the ribbon with the label Improve Q&A with synonyms from Copilot. They can also be generated from the Q&A settings menu by adding Copilot as a source from the Suggestion settings list.The more synonyms that can be used to describe your data, the more likely you are to produce quality Q&A results. It is important to double-check the synonyms generated by Copilot to ensure they line up with your specific business terminology.With these Copilot experiences for Power BI, you will be able to generate report ideas, report pages and visuals, summaries, and narratives, and improve Q&A.ConclusionIn conclusion, by mastering the creation of Fabric workspaces in Power BI, you've laid a solid foundation for efficient data management and reporting. With Fabric's capabilities at your fingertips, you're equipped to streamline workflows, generate insightful reports, and enhance collaboration within your organization. Keep exploring the diverse functionalities of Fabric to continuously refine your Power BI experience and stay ahead in the realm of data analytics.Author bioArshad Ali is a principal product manager at Microsoft, working on the Microsoft Fabric product team in Redmond, WA. He focuses on Spark Runtime, which empowers both data engineering and data science experiences. In his previous role, he helped strategic customers and partners adopt Azure Synapse and Microsoft Fabric.Arshad has more than 20 years of industry experience and has been with Microsoft for over 16 years. He is the co-author of the book Big Data Analytics with Azure HDInsight and the author of over 200 technical articles and blogs on data and analytics. Arshad holds an MBA from the Foster School of Business at the University of Washington and an MCA from India.Bradley Schacht is a principal program manager on the Microsoft Fabric product team based in Saint Augustine, Florida. Bradley is a former consultant and trainer and has co-authored five books on SQL Server and Power BI. As a member of the Microsoft Fabric product team, Bradley works directly with customers to solve some of their most complex data problems and helps shape the future of Microsoft Fabric. Bradley gives back to the community by speaking at events, such as the PASS Summit, SQL Saturday, Code Camp, and user groups across the country, including locally at the Jacksonville SQL Server User Group (JSSUG). He is a contributor on SQLServerCentral and blogs on his personal site, BradleySchacht.

0
0
18462

article-image-enhancing-image-search-with-vector-similarity

Bahaaldine Azarmi, Jeff Vestal

12 Mar 2024

12 min read

Enhancing Image Search with Vector Similarity

Bahaaldine Azarmi, Jeff Vestal

12 Mar 2024

12 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, Vector Search for Practitioners with Elastic, by Bahaaldine Azarmi and Jeff Vestal. Optimize your search capabilities in Elastic by operationalizing and fine-tuning vector search and enhance your search relevance while improving overall search performanceIntroductionVector similarity search plays a crucial role in image search. After images are transformed into vectors, a search query (also represented as a vector) is compared against the database of image vectors to find the most similar matches. This process is known as k-Nearest Neighbor (kNN) search, where “k” represents the number of similar items to retrieve.Several algorithms can be used for kNN search, including brute-force search and more efficient methods such as the Hierarchical Navigable Small World (HNSW) algorithm (see Chapter 7, Next Generation of Observability Powered, by Vectors for a more in-depth discussion on HNSW). Bruteforce search involves comparing the query vector with every vector in the database, which can be computationally expensive for large databases. On the other hand, HNSW is an optimized algorithm that can quickly find the nearest neighbors in a large-scale database, making it particularly useful for vector similarity search in image search systems.The tangible benefits of image search are observed across industries. Its flexibility and adaptability make it a tool of choice for enhancing user experiences, ensuring digital security, or even revolutionizing digital content interactions.Image search in practiceApplications of image search are varied and far-reaching. In e-commerce, for example, reverse image search allows customers to upload a photo of a product and find similar items for sale. In the field of digital forensics, image search can be used to find visually similar images across a database to detect illicit content. It is also used in the realm of social media for face recognition, image tagging, and content recommendation.As we continue to generate and share more visual content, the need for effective and efficient image search technology will only grow. The combination of artificial intelligence, machine learning, and vector similarity search provides a powerful toolkit to meet this demand, powering a new generation of image search capabilities that can analyze and understand visual content.Traditionally, image search engines use text-based metadata associated with images, such as the image’s filename, alt text, and surrounding text context, to understand the content of an image. This approach, however, is limited by the accuracy and completeness of the metadata, and it fails to analyze the actual visual content of the image itself.Over time, with advancements in artificial intelligence and machine learning, more sophisticated methods of image search have been developed that can analyze the visual content of images directly. This technique, known as content-based image retrieval (CBIR), involves extracting feature vectors from images and using these vectors to find visually similar images.Feature vectors are a numerical representation of an image’s visual content. They are generated by applying a feature extraction algorithm to the image. The specifics of the feature extraction process can vary, but in general, it involves analyzing the image’s colors, textures, and shapes. In recent years, CNNs have become a popular tool for feature extraction due to their ability to capture complex patterns in image data.Once feature vectors have been extracted from a set of images, these vectors can be indexed in a database. When a new query image is submitted, its feature vector is compared to the indexed vectors, and the images with the most similar vectors are returned as the search results. The similarity between vectors is typically measured using distance metrics such as Euclidean distance or cosine similarity.Despite the impressive capabilities of CBIR systems, there are several challenges in implementing them. For instance, interpreting and understanding the semantic meaning of images is a complex task due to the subjective nature of visual perception. Furthermore, the high dimensionality of image data can make the search process computationally expensive, particularly for large databases.To address these challenges, approximate nearest neighbor (ANN) search algorithms, such as the HNSW graph, are often used to optimize the search process. These algorithms sacrifice a small amount of accuracy for a significant increase in search speed, making them a practical choice for large-scale image search applications.With the advent of Elasticsearch’s dense vector field type, it is now possible to index and search highdimensional vectors directly within an Elasticsearch cluster. This functionality, combined with an appropriate feature extraction model, provides a powerful toolset for building efficient and scalable image search systems.In the following sections, we will delve into the details of image feature extraction, vector indexing, and search techniques. We will also demonstrate how to implement an image search system using Elasticsearch and a pre-trained CNN model for feature extraction. The overarching goal is to provide a comprehensive guide for building and optimizing image search systems using state-of-the-art technology.Vector search with imagesVector search is a transformative feature of Elasticsearch and other vector stores that enables a method for performing searches within complex data types such as images. Through this approach, images are converted into vectors that can be indexed, searched, and compared against each other, revolutionizing the way we can retrieve and analyze image data. This inherent characteristic of producing embeddings applies to other media types as well. This section provides an in-depth overview of the vector search process with images, including image vectorization, vector indexing in Elasticsearch, kNN search, vector similarity metrics, and fine-tuning the kNN algorithm.Image vectorizationThe first phase of the vector search process involves transforming the image data into a vector, a process known as image vectorization. Deep learning models, specifically CNNs, are typically employed for this task. CNNs are designed to understand and capture the intricate features of an image, such as color distribution, shapes, textures, and patterns. By processing an image through layers of convolutional, pooling, and fully connected nodes, a CNN can represent an image as a high-dimensional vector. This vector encapsulates the key features of the image, serving as its numerical representation.The output layer of a pre-trained CNN (often referred to as an embedding or feature vector) is often used for this purpose. Each dimension in this vector represents some learned feature from the image. For instance, one dimension might correspond to the presence of a particular color or texture pattern.The values in the vector quantify the extent to which these features are present in the image.Figure 1 : Layers of a CNN modelAs seen in the preceding diagram, these are the layers of a CNN model:1. Accepts raw pixel values of the image as input.2. Each layer extracts specific features such as edges, corners, textures, and so on.3. Introduces non-linearity, learns from errors, and approximates more complex functions.4. Reduces the dimensions of feature maps through down-sampling to decrease the computational complexity.5. Consists of the weights and biases from the previous layers for the classification process to take place.6. Outputs a probability distribution over classes.Indexing image vectors in ElasticsearchOnce the image vectors have been obtained, the next step is to index these vectors in Elasticsearch for future searching. Elasticsearch provides a special field type, the dense_vector field, to handle the storage of these high-dimensional vectors.A dense_vector field is defined as an array of numeric values, typically floating-point numbers, with a specified number of dimensions (dims). The maximum number of dimensions allowed for indexed vectors is currently 2,048, though this may be further increased in the future. It’s essential to note that each dense_vector field is single-valued, meaning that it is not possible to store multiple values in one such field.In the context of image search, each image (now represented as a vector) is indexed into an Elasticsearch document. This vector can be one per document or multiple vectors per document. The vector representing the image is stored in a dense_vector field within the document. Additionally, other relevant information or metadata about the image can be stored in other fields within the same document.The full example code can be found in the Jupyter Notebook available in the chapter 5 folder of this book’s GitHub repository at https://github.com/PacktPublishing/VectorSearch-for-Practitioners-with-Elastic/tree/main/chapter5, but we’ll discuss the relevant parts here.First, we will initialize a pre-trained model using the SentenceTransformer library.The clip-ViT-B-32-multilingual-v1 model is discussed in detail later in this chapter:model = SentenceTransformer('clip-ViT-B-32-multilingual-v1')Next, we will prepare the image transformation function:transform = transforms.Compose([ transforms.Resize(224), transforms.CenterCrop(224), lambda image: image.convert("RGB"), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)), ])Transforms.Compose() combines all the following transformations:transforms.Resize(224): Resizes the shorter side of the image to 224 pixels while maintaining the aspect ratio.transforms.CenterCrop(224): Crops the center of the image so that the resultant image has dimensions of 224x224 pixels.lambda image: image.convert("RGB"): This is a transformation that converts the image to the RGB format. This is useful for grayscale images or images with an alpha channel, as deep learning models typically expect RGB inputs.transforms.ToTensor(): Converts the image (in the PIL image format) into a PyTorch tensor. This will change the data from a range of [0, 255] in the PIL image format to a float in a range [0.0, 1.0].transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)): Normalizes the tensor image with a given mean and standard deviation for each channel. In this case, the mean and standard deviation for all three channels (R, G, B) are 0.5. This normalization will transform the data range from [0.0, 1.0] to [-1.0, 1.0].We can use the following code to apply the transform to an image file and then generate an image vector using the model. See the Python notebook for this chapter to run against actual image files:from PIL import Image img = Image.open("image_file.jpg") image = transform(img).unsqueeze(0) image_vector = model.encode(image)The vector and other associated data can then be indexed into Elasticsearch for use with kNN search:# Create document document = {'_index': index_name, '_source': {"filename": filename, "image_vector": vector See the complete code in the chapter 5 folder of this book’s GitHub repository.With vectors generated and indexed into Elasticsearch, we can move on to searching for similar images.k-Nearest Neighbor (kNN) searchWith the vectors now indexed in Elasticsearch, the next step is to make use of kNN search. You can refer back to Chapter 2, Getting Started with Vector Search in Elastic, for a full discussion on kNN and HNSW search.As with text-based vector search, when performing vector search with images, we first need to convert our query image to a vector. The process is the same as we used to convert images to vectors at index time.We convert the image to a vector and include that vector in the query_vector parameter of the knn search function:knn = { "field": "image_vector", "query_vector": search_image_vector[0], "k": 1, "num_candidates": 10 }Here, we specify the following:field: The field in the index that contains vector representations of images we are searching againstquery_vector: The vector representation of our query imagek: We want only one closest imagenum_candidates: The number of approximate nearest neighbor candidates on each shard to search againstWith an understanding of how to convert an image to a vector representation and perform an approximate nearest neighbor search, let’s discuss some of the challenges.Challenges and limitations with image searchWhile vector search with images offers powerful capabilities for image retrieval, it also comes with certain challenges and limitations. One of the main challenges is the high dimensionality of image vectors, which can lead to computational inefficiencies and difficulties in visualizing and interpreting the data.Additionally, while pre-trained models for feature extraction can capture a wide range of features, they may not always align with the specific features that are relevant to a particular use case. This can lead to suboptimal search results. One potential solution, not limited to image search, is to use transfer learning to fine-tune the feature extraction model on a specific task, although this requires additional data and computational resources.ConclusionIn conclusion, vector similarity search revolutionizes image retrieval by harnessing advanced algorithms and machine learning. From e-commerce to digital forensics, its impact is profound, enhancing user experiences and content discovery. Leveraging techniques like k-Nearest Neighbor search and Elasticsearch's dense vector field, image search becomes more efficient and scalable. Despite challenges, such as high dimensionality and feature alignment, ongoing advancements promise even greater insights into visual data. As technology evolves, so does our ability to navigate and understand the vast landscape of images, ensuring a future of enhanced digital interactions and insights.Author BioBahaaldine Azarmi, Global VP Customer Engineering at Elastic, guides companies as they leverage data architecture, distributed systems, machine learning, and generative AI. He leads the customer engineering team, focusing on cloud consumption, and is passionate about sharing knowledge to build and inspire a community skilled in AI.Jeff Vestal has a rich background spanning over a decade in financial trading firms and extensive experience with Elasticsearch. He offers a unique blend of operational acumen, engineering skills, and machine learning expertise. As a Principal Customer Enterprise Architect, he excels at crafting innovative solutions, leveraging Elasticsearch's advanced search capabilities, machine learning features, and generative AI integrations, adeptly guiding users to transform complex data challenges into actionable insights.

0
0
30457

article-image-using-chatgpt-for-customer-service

Amita Kapoor

07 Mar 2024

10 min read

Using ChatGPT for Customer Service

Amita Kapoor

07 Mar 2024

10 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionCustomer service bots of old can often feel robotic, rigid, and painfully predictable. But enter ChatGPT: the fresher, more dynamic contender in the bot arena.ChatGPT isn't just another bot. It's been meticulously trained on a vast sea of text and code, equipping it to grapple with questions that would stump its predecessors. And it's not limited to just customer queries; this versatile bot can craft a range of text formats, from poems to programming snippets.But the standout feature? ChatGPT's touch of humour. It's not just about answering questions; it's about engaging in a way that's both informative and entertaining. So if you're in search of a customer service experience that's more captivating than the norm, it might be time to chat with ChatGPT. Onboarding ChatGPT: A Quick and Easy GuideReady to set sail with ChatGPT? Here's your easy guide to make sure you're all set and ready to roll:1. Obtain the API Key: First, you'll need to get an API key from OpenAI. This is like your secret password to the world of ChatGPT. To get an API key, head to the OpenAI platform and sign up. Once you're signed in, go to the API section and click on "Create New Key."2. Integrate ChatGPT with Your System: Once you have your API key, you can integrate ChatGPT with your system. This is like introducing ChatGPT to your system and making sure they're friends, ready to work together smoothly. To integrate ChatGPT, you'll need to add your API key into your system's code. The specific steps involved will vary depending on your system, but there are many resources available online to help you. Here is an example of how you can do it in Python:import openai import os # Initialize OpenAI API Client api_key = os.environ.get("OPENAI_API_KEY") # Retrieve the API key from environment variables openai.api_key = api_key # Set the API key # API parameters model = "gpt-3.5-turbo" # Choose the appropriate engine max_tokens = 150 # Limit the response length3. Fine-Tune ChatGPT (Optional): ChatGPT is super smart, but sometimes you might need it to learn some specific stuff about your company. That's where fine-tuning comes in. To fine-tune ChatGPT, you can provide it with training data that is specific to your company. This could include product information, customer service FAQs, or even just examples of the types of conversations that you want ChatGPT to be able to handle. Fine-tuning is not required, but it can help to improve the performance of ChatGPT on your specific tasks. [https://www.packtpub.com/article-hub/fine-tuning-gpt-35-and-4].And that's it! With these three steps, ChatGPT will be all set to jump in and take your customer service to the next level. Ready, set, ChatGPT!Utilise ChatGPT for Seamless Question AnsweringIn the ever-evolving world of customer service, stand out by integrating ChatGPT into your service channels, making real-time, accurate response a seamless experience for your customers. Let’s delve into an example to understand the process better.Example: EdTech Site with Online K-12 CoursesImagine operating a customer service bot for an EdTech site with online courses for K-12. You want to ensure that the bot provides answers only on relevant questions, enhancing the user experience and ensuring the accuracy and efficiency of responses. Here's how you can achieve this:1. Pre-defined Context:Initiate the conversation with a system message that sets the context for the bot’s role.role_gpt = "You are a customer service assistant for an EdTech site that offers online K-12 courses. Provide information and assistance regarding the courses, enrollment, and related queries." This directive helps guide the model's responses, ensuring they align with the expected topics.2. Keyword Filtering:Implement keyword filtering to review user’s queries for relevance to topics the bot handles. If the query includes keywords related to courses, enrollment, etc., the bot answers; otherwise, it informs the user about the limitation. Here's a basic example of a keyword filtering function in Python. This function is_relevant_query checks if the query contains certain keywords related to the services offered by the EdTech site.def is_relevant_query(query, keywords): """ Check if the query contains any of the specified keywords. :param query: str, the user's query :param keywords: list of str, keywords to check for :return: bool, True if query contains any keyword, False otherwise """ query = query.lower() return any(keyword in query for keyword in keywords) # Usage example: keywords = ['enrollment', 'courses', 'k-12', 'online learning'] query = "Tell me about the enrollment process." is_relevant = is_relevant_query(query, keywords)Next, we combine the bot role and user query to build the complete messagemessages = [ { "role": "system", "content": f"{role_gpt}" }, {"role": "user", "content": f"{query}"} ]We now make the openAI API can only when the question is relevant:is_relevant = is_relevant_query(query, keywords) if is_relevant: # Process the query with ChatGPT # Make API call response = openai.ChatCompletion.create( model=model, messages=messages ) # Extract and print chatbot's reply chatbot_reply = response['choices'][0]['message']['content' print("ChatGPT: ", chatbot_reply) else: print("I'm sorry, I can only answer questions related to enrollment, courses, and online learning for K-12.")To elevate the user experience, prompt your customers to use specific questions. This subtle guidance helps funnel their queries, ensuring they stay on-topic and receive the most relevant information quickly. Continuous observation of user interactions and consistent collection of their feedback is paramount. This valuable insight allows you to refine your bot, making it more intuitive and adept at handling various questions. Further enhancing the bot's efficiency, enable a feature where it can politely ask for clarification on vague or ambiguous inquiries. This ensures your bot continues to provide precise and relevant answers, solidifying its role as an invaluable resource for your customers.Utilise ChatGPT to tackle Frequently Asked QuestionsAmidst the myriad of queries in customer service, frequently asked questions (FAQs) create a pattern. With ChatGPT, transform the typical, monotonous FAQ experience into an engaging and efficient one.Example: A Hospital ChatbotConsider the scenario of a hospital chatbot. Patients might have numerous questions before and after appointments. They might be inquiring about the hospital’s visitor policies, appointment scheduling, post-consultation care, or the availability of specialists. A well-implemented ChatGPT can swiftly and accurately tackle these questions, giving relief to both the hospital staff and the patients. Here is a tentative role setting for such a bot:role_gpt = "You are a friendly assistant for a hospital, guiding users with appointment scheduling, hospital policies, and post-consultation care."This orientation anchors the bot within the healthcare context, offering relevant and timely patient information. For optimal results, a finely tuned ChatGPT model for this use case is ideal. This enhancement allows for precise, context-aware processing of healthcare-related queries, ensuring your chatbot stands as a trustworthy, efficient resource for patient inquiries.The approach outlined above can be seamlessly adapted to various other sectors. Imagine a travel agency, where customers frequently inquire about trip details, booking procedures, and cancellation policies. Or consider a retail setting, where questions about product availability, return policies, and shipping details abound. Universities can employ ChatGPT to assist students and parents with admission queries, course details, and campus information. Even local government offices can utilize ChatGPT to provide citizens with instant information about public services, documentation procedures, and local regulations. In each scenario, a tailored ChatGPT, possibly fine-tuned for the specific industry, can provide swift, clear, and accurate responses, elevating the customer experience and allowing human staff to focus on more complex tasks. The possibilities are boundless, underscoring the transformative potential of integrating ChatGPT in customer service across diverse sectors. Adventures in AI Land🐙 Octopus Energy: Hailing from the UK's bustling lanes, Octopus Energy unleashed ChatGPT into the wild world of customer inquiries. Lo and behold, handling nearly half of all questions, ChatGPT isn’t just holding the fort – it’s conquering, earning accolades and outshining its human allies in ratings!📘 Chegg: Fear not, night-owl students! The world of academia isn’t left behind in the AI revolution. Chegg, armed with the mighty ChatGPT (aka Cheggmate), stands as the valiant knight ready to battle those brain-teasing queries when the world sleeps at 2 AM. Say goodbye to the midnight oil blues!🥤 PepsiCo: Oh, the fizz and dazzle! The giants aren’t just watching from the sidelines. PepsiCo, joining forces with Bain & Company, bestowed upon ChatGPT the quill to script their advertisements. Now every pop and fizz of their beverages echo with the whispers of AI, making each gulp a symphony of allure and refreshment.Ethical Considerations for Customer Service ChatGPTIn the journey of enhancing customer service with ChatGPT, companies should hold the compass of ethical considerations steadfast. Navigate through the AI world with a responsible map that ensures not just efficiency and innovation but also the upholding of ethical standards. Below are the vital checkpoints to ensure the ethical use of ChatGPT in customer service:Transparency: Uphold honesty by ensuring customers know they are interacting with a machine learning model. This clarity builds a foundation of trust and sets the right expectations.Data Privacy: Safeguard customer data with robust security measures, ensuring protection against unauthorized access and adherence to relevant data protection regulations. For further analysis or training, use anonymized data, safeguarding customer identity and sensitive information.Accountability: Keep a watchful eye on AI interactions, ensuring the responses are accurate, relevant, and appropriate. Establish a system for accountability and continuous improvement.Legal Compliance: Keep the use of AI in customer service within the bounds of relevant laws and regulations, ensuring compliance with AI, data protection, and customer rights laws.User Autonomy: Ensure customers have the choice to switch to a human representative, maintaining their comfort and ensuring their queries are comprehensively addressed.TConclusionTo Wrap it Up (with a Bow), if you're all about leveling up your customer service game, ChatGPT's your partner-in-crime. But like any good tool, it's all about how you wield it. So, gear up, fine-tune, and dive into this AI adventure!Author BioAmita Kapoor is an accomplished AI consultant and educator with over 25 years of experience. She has received international recognition for her work, including the DAAD fellowship and the Intel Developer Mesh AI Innovator Award. She is a highly respected scholar with over 100 research papers and several best-selling books on deep learning and AI. After teaching for 25 years at the University of Delhi, Amita retired early and turned her focus to democratizing AI education. She currently serves as a member of the Board of Directors for the non-profit Neuromatch Academy, fostering greater accessibility to knowledge and resources in the field. After her retirement, Amita founded NePeur, a company providing data analytics and AI consultancy services. In addition, she shares her expertise with a global audience by teaching online classes on data science and AI at the University of Oxford.

0
0
63928

article-image-streamlining-insights-with-microsoft-copilot

Gus Frazer

04 Mar 2024

9 min read

Streamlining Insights with Microsoft Copilot

Gus Frazer

04 Mar 2024

9 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, Data Cleaning with Power BI, by Gus Frazer. Unlock the full potential of your data by mastering the art of cleaning, preparing, and transforming data with Power BI for smarter insights and data visualizationsIntroductionFor those who have never heard of Microsoft Copilot, it is a new technology that Microsoft has released across a number of its platforms that combines generative AI with your data to enhance productivity. Copilot for Power BI harnesses cutting-edge generative AI alongside your dataset, revolutionizing the process of uncovering and disseminating insights with unprecedented speed. Seamlessly integrated into your workflow, Copilot offers an array of functionalities aimed at streamlining your reporting experience.When it comes to report creation, Copilot streamlines the process by allowing users to effortlessly generate reports by articulating the insights they seek or posing questions regarding their dataset using NLP. Copilot then analyzes the data, pulling together relevant information to craft visually striking reports, thereby transforming raw data into actionable insights instantaneously. Moreover, Copilot has the ability to read your data and suggest the best position to begin your analysis, which can then be tailored to suit the direction you want to take the analysis in.This is great, but how can it help you clean and prepare data for analysis? Well, Copilot can be leveraged on multiple data tools from within the Microsoft Fabric platform. For those who are not aware, Power BI has now become part of the Fabric platform. Depending on what type of license you have for Power BI, you might already have access to this. Any customers with Premium capacity licensing for the Power BI service would have automatically been given access to Microsoft Fabric, and more importantly, Copilot.That being said, currently, Copilot has only been made available to customers with a P1 (or above) Premium capacity or a Fabric license of F64 (or above), which is the equivalent licensing available directly from the Azure portal.If you would like to follow along with the next example, you will need to set up a Fabric capacity within your Azure portal. Don’t worry, you can pause this service when it’s not being used to ensure you are only charged for the time you’re using it. Alternatively, follow the steps to see the outcome:1. Log in to the Azure portal that you set up in the previous section of this chapter.2. Select the search bar at the top of the page and type in Microsoft Fabric. Select the service in the menu that appears below the search bar, which should take you to the page where you can manage your capacities.3. Select Create a Fabric capacity. Note that you will need to use an organizational account in order to create a Fabric capacity as opposed to a personal account. You can sign up for a Microsoft Fabric trial for your organization within the window. Further details on how to do this are provided here: https://learn.microsoft.com/en-us/power-bi/ enterprise/service-admin-signing-up-for-power-bi-with-a-newoffice-365-trial.4. Select the subscription and resource group you would like to use for this Fabric capacity.5. Then, under capacity details, you can enter your capacity name. In this example, you can call it cleaningdata.6. The Region field should populate with the region of your tenant, but you can change this if you like. However, this may have implications on performance, which it should warn you about with a message.7. Set the capacity to F64.8. Then, click on select Review + create.9. Review the terms and then click on Create, which will begin the deployment of your capacity.10. Once deployed, select Go to resource to view your Fabric capacity. Take note that this will be active once deployed. Make sure to return here aft er testing to pause or delete your Fabric capacity to prevent yourself from getting charged for this service.Now you will need to ensure you have activated the Copilot settings from within your Fabric capacity. To do this, go to https://app.powerbi.com/admin-portal/ to log in and access the admin portal.Important tipIf you can’t see the Tenant settings tab, then you will need to ensure you have been set up as an admin within your Microsoft 365 admin center. If you have just created a new account, then you will need to set this up. Follow the next links to assign roles:• https://learn.microsoft.com/en-us/microsoft-365/admin/addusers/assign-admin-roles• https://learn.microsoft.com/en-us/fabric/admin/microsoftfabric-admin11. Scroll to the bottom of Tenant settings until you see the Copilot and Azure OpenAI service (preview) section as shown:Figure – The tenant settings from within Power BI12. Ensure both settings are set to Enabled and then click on Apply.Now that you have created your Fabric capacity, let’s jump into an example of how we can use Copilot to help with the cleaning of data. As we have created a new capacity, you will have to create a new workspace that uses this new capacity:1. Navigate back to Workspaces using the left navigation bar. Then, select New Workspace.2. Name your workspace CleaningData(Copilot), then select the dropdown for advanced configuration settings.3. Ensure you have selected Fabric capacity in the license mode, which in turn will have selected your capacity below, and then select Apply. You have now created your capacity!4. Now let’s use Fabric to create a new dataflow using the latest update of Datafl ow Gen2. Select New from within the workspace and then select More options.5. This will navigate you to a page with all the possible actions to create items within your Fabric workspace. Under Data Factory, select Datafl ow Gen2.6. This will load a Datafl ow Gen2 instance called Datafl ow 1. On the top row, you should now see the Copilot logo within the Home ribbon as highlighted:Figure – The ribbon within a Dataflow Gen2 instance7. Select Copilot to open the Copilot window on the right-hand side of the page. As you have not connected to any data, it will prompt you to select get data.8. Select Text/CSV and then enter the following into the File path or URL box:https://raw.githubusercontent.com/PacktPublishing/Data-Cleaningwith-Power-BI/main/Retail%20Store%20Sales%20Data.csv9. Leave the rest of the settings as their defaults and click on Next.10. This will then open a preview of the file data. Click on Create to load this data into your Datafl ow Gen2 instance. You will see that the Copilot window will have now changed to prompt you as to what you would like to do (if it hasn’t, then simply close the Copilot window and reopen):Figure – Data loaded into Dataflow Gen211. In this example, we can see that the data includes a column called Order Date but we don’t have a fi eld for the fi scal year. Enter the following prompt to ask Copilot to help with the transformation:There's a column in the data named Order Date, which shows when an order was placed. However, I need to create a new column from this that shows the Fiscal Year. Can you extract the year from the date and call this Fiscal Year? Set this new column to type number also.12. Proceed using the arrow key or press Enter. Copilot will then begin working on your request. As you will see in the resulting output, the model has added a function (or step) called Custom to the query that we had selected.13. Scroll to the far side and you will see that this has added a new column called Fiscal Year.14. Now add the following prompt to narrow down our data and press Enter:Can you now remove all columns leaving me with just Order ID, Order Date, Fiscal year, category, and Sales?15. This will then add another function or step called Choose columns. Finally, add the following prompt to aggregate this data and press Enter:Can you now group this data by Category, Fiscal year, and aggregated by Sum of Sales?As you can see, Copilot has now added another function called Custom 1 to the applied steps in this query, resulting in this table:Figure – The results from asking Copilot to transform the dataTo view the M query that Copilot has added, select Advanced editor, which will show the functions that Copilot has added for you:Figure – The resulting M query created by Copilot to carry out the request transformations to clean the dataIn this example, you explored the new technologies available with Copilot and how they help to transform the data using tools such as Datafl ow Gen2.While it’s great to understand the amazing possibilities AI brings to data, it’s also crucially important that you understand the challenges it presents.ConclusionIn conclusion, Microsoft Copilot offers a groundbreaking approach to enhancing productivity and efficiency in data analysis and report generation within Power BI. By seamlessly integrating generative AI technology, Copilot revolutionizes the way insights are discovered and data is prepared, providing users with unprecedented speed and accuracy. Whether streamlining report creation or optimizing data management tasks, Copilot empowers users to unlock the full potential of their data, paving the way for more informed decision-making and actionable insights.Author BioGus Frazer is a seasoned Analytics Consultant focused on Business Intelligence solutions. With over 7 years of experience working for the two market-leading platforms, Power BI & Tableau, has amassed a wealth of knowledge and expertise. Gus has helped hundreds of customers to drive their digital and data transformations, scope data requirements, drive actionable insights, and most important of all, cleanse data ready for analysis. Most recently helping to set up, organize and run the Power BI UK community at Microsoft. He holds 6 Azure and Power BI certifications, including the PL-300 and DP-500 certifications. In this book, Gus offers readers invaluable guidance on ingesting, preparing, and cleansing data for analysis in Power BI. --This text refers to an out of print or unavailable edition of this title.

0
0
21345

article-image-ai-distilled-38-latest-in-ai-sora-gemini-15-and-more

Merlyn Shelley

01 Mar 2024

9 min read

AI_Distilled 38: Latest in AI: Sora, Gemini 1.5, and More

Merlyn Shelley

01 Mar 2024

9 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!👋 Hello,“People say AI is overhyped, but I think it's not hyped enough. The next generation who will use this in the next few years will have a much higher bar on what technology can do for them. So how you build it for that generation, how you build it for that future will be really interesting to see.”-Puneet Chandok, Microsoft India and South Asia presidentSpeaking at a panel discussion on AI at the Mumbai Tech Week, Chandok believes AI is not hyped enough considering its potential for disruptive transformation. He encourages more training on AI to realize its full potential.Welcome back to a new issue of AI Distilled - your one-stop destination for all things AI, ML, NLP, and Gen AI. Let’s get started with the latest news and developments across the AI sector:OpenAI unveils Sora, an AI model generating videos from textGoogle's latest conversational AI model Gemini 1.5 has a million-token context windowNew AI news reader app tackles clickbait headlines, provides summariesSlack is rolling out new AI features for enterprise users including thread summariesLangChain announced raising $25 million to launch new platform for building LLM appsAI helps improve medical imaging to benefit patients globallyResearchers develop AI model that determines a person's sex from brain scansWe’ve also curated the latest GPT and LLM resources, tutorials, and secret knowledge:Giving AI Models a Better Memory: How Google DeepMind Expanded Context WindowsAdvanced Techniques For More Relevant AI ResponsesReinforcement Learning ExplainedBridging the Gap Between AI and App DevelopmentFinally, don’t forget to check-out our hands-on tips and strategies from the AI community for you to use on your own projects:Creating Custom Models Without the Hassle of Data CollectionCode Your Own AI Coding BuddyEvaluating Code Quality with AI AssistantsEasily Deploy Language Models LocallyLooking for some inspiration? Here are some GitHub repositories to get your projects going!gptscript-ai/gptscriptkarpathy/minbpeAAAI-DISIM-UnivAQ/DALIQwenLM/QwenWriter’s Credit: Special shout-out to Vidhu Jain for her valuable contribution to this week’s issue.Cheers, Kartikey Pandey Editor-in-Chief, Packt ⚡ TechWave: AI/GPT News & AnalysisOpenAI unveiled Sora, an AI model generating videos from text at up to a minute in length. Sora demonstrates an understanding of language and the physical world and photorealism across styles, though human subjects appear game-like.Google's latest conversational AI model Gemini 1.5 analyzes more information than before, thanks to a million-token context window. This allows for summarizing the Apollo 11 mission transcript or analyzing a 44-minute silent film in full. Early results show the system maintains performance as context grows into the millions.Bulletin, a new AI-powered news reader app, tackles clickbait headlines and provides summaries of news articles with customizable news sources.Slack is rolling out new AI features for enterprise users including thread summaries, channel recaps, and answering workplace questions. The tools provide highlights from missed messages and help catch up.LangChain announced raising $25 million to launch their new platform LangSmith for building and monitoring LLM apps. LangSmith allows developers to accelerate workflows across development, testing, deployment, and monitoring. It has already seen significant adoption with over 70,000 signups and 5000 monthly active companies.Courtesy: Bulletin/Shihab MehboobAI is helping improve medical imaging to benefit patients globally. ML can quickly analyze large datasets to find issues doctors may miss and flag urgent cases. Cloud solutions also enable sharing scans and remote expert assistance anywhere. Companies are applying these methods to speed diagnoses, reduce wait times, and bring ultrasounds directly to homes. Researchers have also developed an AI model that can determine a person's sex from brain scans with over 90% accuracy. The model analyzed dynamic MRI scans and identified the default mode, striatum, and limbic networks as key in distinguishing male and female brains. This breakthrough furthers our understanding of brain organization and could help address sex-specific health issues. 🔮 Expert Insights from Packt Community Generative AI with LangChain - By Dr. Ben AuffarthChatGPT and the GPT models by OpenAI have brought about a revolution not only in how we write and research but also in how we can process information.This book discusses the functioning, capabilities, and limitations of LLMs underlying chat systems, including ChatGPT and Bard. It also demonstrates, in a series of practical examples, how to use the LangChain framework to build production-ready and responsive LLM applications for tasks ranging from customer support to software development assistance and data analysis Key TakeawaysExplore the expansive utility of LLMs in real-world applications.Guidance on fine-tuning, prompt engineering, and best practices.Learn how to use the LangChain framework to build production-ready LLM applications.By the end of this book, you'll be equipped with the practical knowledge and skills to leverage the transformative power of generative AI with confidence and creativity.Read More🌟 Secret Knowledge: AI/LLM Resources🌀 Giving AI Models a Better Memory: How Google DeepMind Expanded Context Windows: Google DeepMind's latest AI model Gemini 1.5 has significantly improved how much information it can process at once, thanks to advances in "long context windows." The team discovered their model could understand over 1 million pieces of information in a single sitting, far surpassing earlier limits. This opens up new possibilities for tasks like summarizing lengthy documents, analyzing large codebases, and even comprehending full movies. Developers are excited to explore creative uses of this expanded recall.🌀 Advanced Techniques For More Relevant AI Responses: This article discusses how to improve AI conversation models like RAG by enhancing how information is stored, found and used. Methods covered include indexing sentences individually while keeping their surrounding context, combining keyword search with semantic search, and re-scoring results based on the question. The author demonstrates implementing these "advanced RAG" techniques in Python using tools like LlamaIndex and Weaviate. With these optimizations, AI systems can provide more helpful responses by accessing knowledge in a targeted manner.🌀 Reinforcement Learning Explained: This article breaks down the key concepts of reinforcement learning in an easy-to-understand way. It covers states, actions, rewards, and how agents interact with environments to learn policies. RL agents try different strategies to maximize long-term rewards through trial and error. Episodes provide a framework to evaluate policies. Deterministic policies pick set actions while stochastic policies use probabilities. Whether you're new to RL or a veteran, this primer is worth a read to get acquainted with the basics.🌀 Bridging the Gap Between AI and App Development: As AI becomes more advanced, developers need easier ways to integrate cutting-edge features into their work. However, directly using AI code frameworks can be challenging and limit scalability. The solution? AI gateways. By handling tasks like routing, caching, and monitoring behind the scenes, gateways act as a bridge between complex AI systems and traditional development workflows. They streamline the integration process while ensuring high performance. Are gateways the future of intelligent applications?Partnering with Notion Ever tried Notion? It's a workspace that helps you do things better and faster.You get AI for notes and teamwork, easy drag-and-drop for content, and cool new features to help manage projects and share knowledge.Give it a try!🔛 Masterclass: AI/LLM Tutorials🌀 Creating Custom Models Without the Hassle of Data Collection: Tired of spending big bucks to use proprietary AI APIs or going through the tedious process of collecting your training data? This page shows how you can train customized models more efficiently. By using an open-source LLM to generate synthetic annotations for a small sample of your data, you can then fine-tune a smaller model tailored exactly to your needs. The process takes just a few steps and allows you to analyze large datasets for a fraction of the cost. Best of all, you avoid sending sensitive data to third parties.🌀 Code Your Own AI Coding Buddy: This guide shows you how to build an AI assistant that lives right on your computer. Using tools like HuggingFace and Streamlit, you can create a chatbot trained on Code Llama. Simply ask it questions and it will respond with examples in languages like Python, Java, and C++. Better yet, the models are free and open-source. This is a neural net sidekick to help automate repetitive tasks and speed up your workflow.🌀 Evaluating Code Quality with AI Assistants: This article explores using AI to improve code quality by testing Python scripts with SonarQube and getting feedback from LLMs. The author ran tests on ChatGPT and open-source models like Code Llama to see if they could identify issues flagged by SonarQube. While the models struggled to pinpoint errors solely from descriptions, some provided insightful summaries. Continued development of coding-focused LLMs may help automate part of the review process.🌀 Easily Deploy Language Models Locally: With a simple four-step process, you can get powerful language models like ChatGPT running on your hardware. First, choose a model from HuggingFace and quantize it for faster performance. Then build an Ollama image to serve the model. For a slick interface, deploy a ChatGPT-style React app talking to Ollama via Docker. The whole setup only takes around 15 minutes. Now you've got a custom language assistant without internet dependence.🚀 HackHub: Trending AI Tools🌀 gptscript-ai/gptscript: Open source NLP tool that allows developers to automate tasks by writing scripts in plain English.🌀 karpathy/minbpe: Minimal and clean Python code for the byte pair encoding algorithm commonly used in NLP and language model tokenization.🌀 AAAI-DISIM-UnivAQ/DALI: Framework allowing developers to build multi-agent systems in Prolog for applications like robotics, event processing, and more.🌀 QwenLM/Qwen: Open source code, models, and documentation for the Qwen series of LLMs, including Qwen, Qwen-Chat, and their various sizes.

0
0
48324

article-image-getting-started-with-microsoft-guidance

Prakhar Mishra

28 Feb 2024

8 min read

Getting Started with Microsoft Guidance

Prakhar Mishra

28 Feb 2024

8 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights and books. Don't miss out – sign up today!IntroductionThe emergence of a massive language model is a watershed moment in the field of artificial intelligence (AI) and natural language processing (NLP). Because of their extraordinary capacity to write human-like text and perform a range of language-related tasks, these models, which are based on deep learning techniques, have earned considerable interest and acceptance. This field has undergone significant scientific developments in recent years. Researchers all over the world have been developing better and more domain-specific LLMs to meet the needs of various use cases.Large Language Models (LLMs) such as GPT-3 and its descendants, like any technology or strategy, have downsides and limits. And, in order to use LLMs properly, ethically, and to their maximum capacity, it is critical to grasp their downsides and limitations. Unlike large language models such as GPT-4, which can follow the majority of commands. Language models that are not equivalently large enough (such as GPT-2, LLaMa, and its derivatives) frequently suffer from the difficulty of not following instructions adequately, particularly the part of instruction that asks for generating output in a specific structure. This causes a bottleneck when constructing a pipeline in which the output of LLMs is fed to other downstream functions.Introducing Guidance - an effective and efficient means of controlling modern language models compared to conventional prompting methods. It supports both open (LLaMa, GPT-2, Alpaca, and so on) and closed LLMs (ChatGPT, GPT-4, and so on). It can be considered as a part of a larger ecosystem of tools for expanding the capabilities of language models.Guidance uses Handlebars - a templating language. Handlebars allow us to build semantic templates effectively by compiling templates into JavaScript functions. Making it’s execution faster than other templating engines. Guidance also integrates well with Jsonformer - a bulletproof way to generate structured JSON from language models. Here’s a detailed notebook on the same. Also, in case you were to use OpenAI from Azure AI then Guidance has you covered - notebook.Moving on to some of the outstanding features that Guidance offers. Feel free to check out the entire list of features.Features1. Guidance Acceleration - This addition significantly improves inference performance by efficiently utilizing the Key/Value caches as we proceed through the prompt by keeping a session state with the LLM inference. Benchmarking revealed a 50% reduction in runtime when compared to standard prompting approaches. Here’s the link to one of the benchmarking exercises. The below image shows an example of generating a character profile of an RPG game in JSON format. The green highlights are the generations done by the model, whereas the blue and no highlights are the ones that are copied as it is from the input prompt, unlike the traditional method that tries to generate every bit of it.SourceNote: As of now, the Guidance Acceleration feature is implemented for open LLMs. We can soon expect to see if working with closed LLMs as well.2. Token Healing - This feature attempts to correct tokenization artifacts that commonly occur at the border between the end of a prompt and the start of a group of generated tokens.For example - If we ask LLM to auto-complete a URL with the below-mentioned Input, it’s likely to produce the shown output. Apart from the obvious limitation that the URL might not be valid. I'd like to draw your attention to the extra space it creates (highlighted in red). Such considerations make it difficult to construct a dependable parsing function and robustly absorb its result into subsequent phases.Input: “The link is <a href=http:”Actual Output: “The link is <a href=http: //www.google.com/search?q”Expected Output: “The link is <a href=http://www.google.com/search?q” This is the exact bucket of problems that Token Healing tries to solve using the backtracking method. Feel free to check out this jupyter notebook for more examples.3. Guaranteed Output Structure - Large language models are fantastic at producing useful outputs, but not so much at producing outputs in a specified format (especially open-source ones like LLaMa, GPT-2, and so on). When we want to use the output of a language model as input to another system, this is frequently an issue. With Handlebars, guidance guarantees the output format to be the same as what was being asked for.Let’s now see Guidance in action -InstallationInstalling guidance is a breeze, just do a pip :$ pip install guidanceAssume we are now creating a product description for an e-commerce website. Here's how the traditional generation compares to the guidance generation. Feel free to play with this colab notebook with both the below examples.Traditional GenerationInput:Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The below shown JSON is the structure of the description with constraints for each of the attributes enclosed within < > brackets. Please follow the mentioned constraint and format diligently. { prod_id: <numeric value of 5 digits>, prod_name: <name starts with the prefix 'p_'>, prod_price: <should be an integer between 1 and 16. Should end with suffix '$'> } The product description isOutput:Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The below shown JSON is the structure of the description with constraints for each of the attributes enclosed within < > brackets. Please follow the mentioned constraint and format diligently. { prod_id: <numeric value of 5 digits>, prod_name: <name starts with the prefix 'p_'>, prod_price: <should be an integer between 1 and 16. Should end with suffix '$'> } The product description is { resentprod_id: <numeric value of 5 digits>, resentprod_name: <name begins with the prefix 'p_'>, resentprod_price: <should be an integer between 1 and 16. Should end with suffix '$'> } In the above example, the product description has 5 constraint fields and 5 attribute fields. The constraints are as follows: resentprod_id: - value of 5 digits, resentprod_name: - name of the product, resentprod_price: - price of the product, resentprod_price_suffix: - suffix of the product price, resentprod_id: - the product id, resentpro diabetic_id: value of 4 digits, resentprod_ astronomer_id: - value of 4 digits, resentprod_ star_id: - value of 4 digits, resentprod_is_generic: - if the product is generic and not the generic type, resentprod_type: - the type of the product, resentprod_is_generic_typeHere’s the code for the above example with GPT-2 language model -``` from transformers import AutoModelForCausalLM, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("gpt2-large") model = AutoModelForCausalLM.from_pretrained("gpt2-large") inputs = tokenizer(Input, return_tensors="pt") tokens = model.generate( **inputs, max_new_tokens=256, temperature=0.7, do_sample=True, )Output:tokenizer.decode(tokens[0], skip_special_tokens=True)) ```Guidance GenerationInput w/ code:guidance.llm = guidance.llms.Transformers("gpt-large") # define the prompt program = guidance("""Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of fixed set of fields to be filled in the JSON. The following is the format ```json { "prod_id": "{{gen 'id' pattern='[0-9]{5}' stop=','}}", "prod_name": "{{gen 'name' pattern='p_[A-Za-z]+' stop=','}}", "prod_price": "{{gen 'price' pattern='\b([1-9]|1[0-6])\b\$' stop=','}}" }```""") # execute the prompt Output = program()Output:Consider you are an e-commerce expert. You need to write a product description for a product to be listed on the e-commerce website. The product description consists of a fixed set of fields to be filled in the JSON. The following is the format```json { "prod_id": "11231", "prod_name": "p_pizzas", "prod_price": "11$" }```As seen in the preceding instances, with guidance, we can be certain that the output format will be followed within the given restrictions no matter how many times we execute the identical prompt. This capability makes it an excellent choice for constructing any dependable and strong multi-step LLM pipeline.I hope this overview of Guidance has helped you realize the value it may provide to your daily prompt development cycle. Also, here’s a consolidated notebook showcasing all the features of Guidance, feel free to check it out.Author BioPrakhar has a Master’s in Data Science with over 4 years of experience in industry across various sectors like Retail, Healthcare, Consumer Analytics, etc. His research interests include Natural Language Understanding and generation, and has published multiple research papers in reputed international publications in the relevant domain. Feel free to reach out to him on LinkedIn

0
0
37191

article-image-setting-up-polars-for-data-analysis

Luca Zanna

23 Feb 2024

7 min read

Setting Up Polars for Data Analysis

Luca Zanna

23 Feb 2024

7 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, Data Analysis with Polars, by Luca Zanna. Leverage Polars, the lightning-fast dataframe library, to take your Python data analysis skills to the next levelIntroductionIn the ever-evolving landscape of data analysis, harnessing the right tools and methodologies can make all the difference. Welcome to a world where Polars, a powerful data manipulation library, takes center stage. This article is your gateway to unlocking the potential of Polars, and it begins by unraveling the essential components of the data analysis journey. From setting up virtual environments to simplifying data analysis in the cloud with Google Colab, we explore how Polars streamlines your path to insights. Whether you're a seasoned data analyst or just starting your journey, this guide will equip you with the knowledge and tools needed to make your data analysis endeavors efficient and rewarding. Join us as we delve into the fascinating realm of Polars and embrace a new era of data exploration.Installation and virtual environments We will not go through the installation of Python as that is outside the scope of the book. A visit to python.org will give all the information necessary to install Python. Now on to virtual environments. Understanding Virtual Environments and Their Benefits Imagine you have built a fantastic data analysis project using Polars. Your project uses: Python 3.8Polars version 0.15.1 Numpy 1.23.0 Now, you start a new project, and you want to use a newer Polars (0.16.14), along with Numpy and Arrow. So, the new project requires: Python 3.10 Polars 0.16.14 Numpy 1.24.0 Pyarrow 11.0.0 Upgrading Polars and Numpy libraries globally isn't a good idea. If Polars functions have changed between versions, your first project might stop working or give incorrect results with the new version. This is where virtual environments come in. Virtual environments create separate 'spaces' for each project: one for your first data analysis project and another for your new data pipeline project. You can set up a virtual environment manually or have your IDE set-up a virtual environment for you. If you decide to set it up manually, you can check out the guide at https://packaging.python.org/en/latest/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment. Installing and using Polars on a machine To install Polars, first make sure you are in a virtual environment. Then, type: pip install polars If you already have Polars installed and want to upgrade it, type: pip install polars --upgrade In the book we will use other libraries, including numpy, pandas, matplotlib. You can install them with the syntax above, and you can also install multiple libraries at the same tine: pip install numpy pandas matplotlib Let’s now get our development environment set-up. We will use Visual Studio code, but you are free to use any other IDE that you like. 1. Type code . in the command line to open Visual Studio Code. 2. Right-click on the left, choose New File, and create first_dataframe.ipynb. Figure – Creating a new file in Visual Studio Code Files with extension .ipynb are Jupyter Notebook files, which are great for data analysis. To work with these files you need to install the Jupyter extension on VS Code. You can do that by clicking on ‘Extensions’ on the left bar, searching for Jupyter, installing it, and activating it. Figure – Install Jupyter extension in Visual Studio Code 3. Now back to our file. The first thing to ensure is that we are using Python from our virtual environment. Click on Select Kernel at the top right, then click on the Python that starts with env/: that will be the Python for our virtual environment. Avoid the paths starting with /usr and /bin as those are the system Python instead of our virtual environment. Figure – Select the Python interpreter in Visual Studio code Now, we're ready for Polars. 4. Type import polars as pl in the first cell and press Shift + Enter to run it. 5. Create a dataframe in the next cell by typing: df = pl.DataFrame({ 'a': ['Hello', 'World!'] }) 6. Press Shift + Enter to run the cell. This creates a dataframe called df with one column named 'a' and two rows: 'Hello' and 'World!' To see the dataframe, type df in the next cell and run it. Figure – Visual Studio code with first Polars dataframe We created our first Polars dataframe. Using Polars on the cloud with Google Colab Instead of installing Polars on your computer, you can also use it in the cloud. One popular cloud service for running code is Google Colab. This way, you don't need to install anything on your machine. To access Google Colab, visit https://colab.research.google.com/ in your web browser. Click on "New Notebook," and you'll see a page that looks similar to VS Code. Now, let's create the same Polars dataframe example in Google Colab: 1. In the first cell, type the following command to ensure we have the latest version of Polars: %pip install polars --upgrade 2. Next, enter this code to import Polars and create a dataframe: import polars as pl df = pl.DataFrame({ 'a': ['Hello', 'World !'] }) Finally, display the dataframe by typing: df And that's it! You now have your first Polars dataframe in Google Colab. Figure – Google Colab with first Polars dataframe ConclusionIn closing, Polars offers a bridge to the future of data analysis. With the knowledge and hands-on experience gained from this article, you're well-prepared to conquer the intricacies of data manipulation and visualization. The ability to effortlessly create, manipulate, and analyze data using Polars is a powerful tool in your arsenal. Whether you're a data enthusiast or a seasoned analyst, embracing Polars sets you on a path toward efficiency, precision, and data-driven success. As the data landscape continues to evolve, you're now equipped to stay ahead, make informed decisions, and revolutionize your approach to data exploration.Author BioLuca Zanna is a Data Engineer and Data Analyst with over 15 years of experience. He started his career as a financial data analyst after a Master's in Management and passing the Certified Public Accountant (CPA) exam. Luca spent a decade working on financial analysis systems at L’Oréal: developing the systems and training financial analysts across Europe and Asia.Currently, Luca helps companies with building data infrastructure to better leverage their data. Luca is also a corporate teacher for topics such as data analysis, SQL, Python, and cloud data engineering.

0
0
43446

article-image-leveraging-google-cloud-for-custom-endpoint-with-openai

Henry Habib

20 Feb 2024

8 min read

Leveraging Google Cloud for Custom Endpoint with OpenAI

Henry Habib

20 Feb 2024

8 min read

Dive deeper into the world of AI innovation and stay ahead of the AI curve! Subscribe to our AI_Distilled newsletter for the latest insights. Don't miss out – sign up today!This article is an excerpt from the book, OpenAI API Cookbook, by Henry Habib. Integrate the ChatGPT API into various domains ranging from simple wrappers to knowledge-based assistants, multi-model, and conversational applicationsIntroductionIn the realm of application development, the integration of OpenAI's capabilities through custom backend endpoints on Google Cloud is a pivotal step towards unlocking intelligent solutions. This article explores the process of creating such endpoints using Google Cloud's Cloud Functions, allowing for control, customization, and the seamless integration of OpenAI's features. Through this fusion of technologies, developers can craft innovative applications that leverage the power of artificial intelligence to enhance user experiences and drive creativity in their projects.Creating a public endpoint server that calls the OpenAI APIThere are many important benefits of creating your own public endpoint server that calls the OpenAI API, instead of connecting to the OpenAI API directly – the biggest being control and customization, which we will explore in this recipe and the next recipe.In this recipe, we will use GCP to host our public endpoint. When this endpoint is called, it will make a request to OpenAI for a slogan for an ice cream company and then will return the answer to the user. This sounds simple and almost unnecessary to make a public endpoint, but it is the final step we need to build a truly intelligent application that leverages OpenAI.To do this, we will create a GCP resource called Cloud Functions, which we will explore later in the How it works… section of the recipe.Getting readyEnsure you have an OpenAI platform account with available usage credits. If you don’t, please follow the Setting up your OpenAI API Playground environment recipe in Chapter 1. Furthermore, ensure you have created a GCP account. To do this, navigate to https://cloud. google.com/, then select Start Free from the top right, and follow the instructions that you see.You may need to provide a billing profile as well to create any GCP resources. Note that GCP does have a free tier, and in this recipe, we will not go above the free tier (so, essentially, you should not be billed for anything).You may need to create a project if this is your first time logging into Google Cloud Platform. After you log in, select Select a project from the top left and then select New Project. Provide a project name and then select Create.The next recipe in this chapter will also have this same requirement.How to do it…1. Navigate to https://console.cloud.google.com/. In the Search field at the top of the page, type in Cloud Functions and select the top choice from the drop-down menu, Cloud Functions.Figure – Cloud Functions in the dropdown2. Select Create Function from the top of the page. This will begin to create our custom backend endpoint and start the configuration steps.On the Configuration page, fill in the following steps:Environment: Select 2nd gen from the drop-down menu.Function name: Since we’re creating a backend endpoint that will produce company slogans, the function name will be slogan_creator.Region: Choose the environment location nearest you.In the Trigger menu, choose HTTPS. In the Authentication sub-menu, select Allow unauthenticated invocation. We need to check this as we are going to create a public endpoint that will be accessible from our frontend services. Figure – Sample configuration settings of a Google Cloud Function3. Select the Next button on the bottom of the page to then move on to the Code section.4. From the Runtime dropdown, select Python 3.12. This ensures that our backend endpoint will be coded using the Python programming language.5. For that Entry point option, type in create_slogan. This refers to the name of the function in Python that is called when the public endpoint is reached and triggered.6. On the left-hand side menu, you will see two files: main.py and requirements.txt. Select the requirements.txt file. This will list all the Python packages that need to be installed for our Cloud Function to operate.7. In the center of the screen where the contents of requirements.txt are displayed, enter a new line and type in openai. This will ensure that the latest openai library package is installed. Your screen should look like what’s displayed in Figure below.Figure – Snapshot of the requirements.txt file8. From the left-hand side menu, select main.py. Copy and paste the following code into the center of the screen (where the content for that file is displayed). These are the instructions that the public endpoint will run when it is triggered:import functions_framework from openai import OpenAI @functions_framework.http def create_slogan(request): client = OpenAI(api_key = '<API Key here>') response = client.chat.completions.create( model="gpt-3.5-turbo", messages=[ { "role": "system", "content": "You are an AI assistant that creates one slogan based on company descriptions" }, { "role": "user", "content": "A company that sells ice cream" } ], temperature=1, max_tokens=256, top_p=1, frequency_penalty=0, presence_penalty=0 ) return response.choices[0].message.contentAs you can see, it simply calls the OpenAI endpoint, requests a chat completion, and then returns the output to the user. You will also need your OpenAI API key.9. Next, deploy the function by selecting the Deploy button at the bottom of your page.10. Wait for your function to be fully deployed, which typically takes two minutes. You can verify whether the function has been deployed or not by observing the progress in the top left section of the page (shown in Figure below). Once it is green and checkmarked, the build is successful, and your function has been deployed. Figure – The Cloud Function deployment page11. Now, let’s verify that our function works. Select the endpoint URL, found on the top of the page near URL. It’s typically in the form https://[location]-[project-name]. cloudfunctions.net/[function-name]. It is also highlighted in the above Figure.12. This will open a new web page that will trigger our custom public endpoint, and return a chat completion, which, in this case, is the slogan for an ice cream business. Note that this is a public endpoint – this will work on your computer, phone, or any device connected to the internet.Figure – Output of a Google Cloud FunctionHow it works…In this recipe, we created a public endpoint. This endpoint can be accessed by anyone (including your application in future recipes). The logic of the endpoint is simple and something we have covered prior: return a slogan for a company that sells ice cream. What’s new, however, is that this is our very own public endpoint that is hosted in Google Cloud, using the Cloud Function resource.Note that we used the free tier of Google Cloud Functions, which does have limitations such as a cap on the number of function invocations per month, limited execution time, and constrained computational resources. However, for our current purposes, these limitations are not a hindrance, allowing us to deploy and test our functions effectively without incurring costs. This setup is ideal for small-scale applications or for learning and experimentation purposes, providing a practical way to understand cloud functionalities and serverless architecture in a cost-effective manner.ConclusionIn conclusion, the synergy between Google Cloud's infrastructure and OpenAI's capabilities offers developers a powerful platform for creating intelligent applications. By leveraging Cloud Functions to build custom backend endpoints, developers can unlock a world of possibilities for innovation and creativity. This article has provided a comprehensive guide to integrating OpenAI into Google Cloud, empowering developers to craft intelligent solutions that enhance user experiences and drive the evolution of application development. With this knowledge, developers are well-equipped to embark on their journey of building intelligent applications that push the boundaries of what is possible in the digital landscape.Author BioHenry Habib is a Manager at one of the world's top management consulting firms, advising F500 companies on analytics and operations, with a particular focus on building intelligent AI-driven solutions and tools to create impact. He is a passionate online instructor and educator, amassing a of more than 150K paid students and facilitating technical programs at large banks and governmental.A proponent in the no-code and LLM revolution, he believes that anyone can now create powerful and intelligent applications without any deep technical skills. Henry resides in Toronto, Canada with his wife, and enjoys reading AI research papers and playing tennis in his free time.

0
0
22676

Apple’s ReALM, Google DeepMind’s Gecko, X.ai's Grok 1.5, Salesforce AI’s Moira, Stability AI’s Stable Audio 2.0, TWIN-GPT, ChatGPT Instant usage

BI-Pro#49: Microsoft Fabric Lifecycle Management, Data Factory Adds CI/CD to Fabric Data Pipelines, Database Mirroring, AWS Well-Architected Data Analytics Lens

Databricks' DBRX, Stability AI's Stable Code Instruct 3B, SambaNova's Samba CoE v0.2, FrugalGPT, Advanced RAG Patterns on Amazon SageMaker

Elevate Your BI Dashboards with Figma

Transforming Web Data with Browse AI

ChatGPT for Data Governance

AI_Distilled #39: Unpacking Mistral Large, Google's Gemini Challenges, and Copilot Enterprise

Get Started with Fabric: Create Your Workspace & Reports

Enhancing Image Search with Vector Similarity

Using ChatGPT for Customer Service

Trending Topics

Streamlining Insights with Microsoft Copilot

AI_Distilled 38: Latest in AI: Sora, Gemini 1.5, and More

Getting Started with Microsoft Guidance

Setting Up Polars for Data Analysis

Leveraging Google Cloud for Custom Endpoint with OpenAI

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access