





















































Join the 2-Day Free AI Upskilling Sprint by Outskill which comes with 16 hours of intensive training on AI frameworks, tools and tactics that will make you an AI expert. Originally priced at $499, but the first 100 of you get in for completely FREE! Claim your spot now for $0! 🎁
Inside the AI Bootcamp, you will learn:
By the way, you will be learning from mentors from the top industries across the globe like Microsoft, Google, META, Amazon, etc. 🎁 You will also unlock $3,000+ in AI bonuses: 💬 Slack community access, 🧰 top AI tools, and ⚙️ ready-to-use workflows — all free when you attend!
Sponsored
🎉 Welcome to the 100th Edition of BIPro! 🎉
Your trusted source for business intelligence insights, now with a powerful new twist.
This milestone marks more than just a number, it’s a testament to the incredible community we’ve built together. Your support, feedback, and curiosity have made BIPro the go-to digest for data professionals navigating the rapidly evolving world of BI and analytics.
To celebrate, we’re thrilled to launch the Deep Dive Column, spearheaded by our Content Engineer Ayushi Bulani. This new format tackles some of the most urgent questions in BI, starting with how Generative AI is reshaping (and sometimes complicating) data workflows. From hallucinated SQL to schema confusion, Ayushi breaks down what GenAI can’t do yet, and how data teams can work around those gaps with precision and care.
🔍 Start with the lead story: [What Generative AI Still Can’t Do (and Why That Matters)]
💡 This edition also delivers big updates across the data and AI ecosystem:
⚡ Quick Wins You Don’t Want to Miss:
📣 Plus: Take our AI Frustration Survey and help shape our upcoming issue on prompt engineering for data pros.
As we step into this new phase, let’s build the future of BI together. Share your ideas, feedback, and wish, we’re listening.
Here’s to the next 100 editions of innovation, insight, and impact.
Cheers,
Merlyn Shelley
Growth Lead, Packt
Get a head start on our upcoming release, Mathematics of Machine Learning by Tivadar Danka, with this free downloadable primer.
🔍 Inside:
📩 Enter your email to get Essential Math for Machine Learning delivered to your inbox within 24 hours.
Overcome architectural pitfalls that slow down GenAI deployments
Achieve zero-copy, real-time, permission-aware data access
See how to use DSPM capabilities for secure, compliant data handling
Sponsored
Generative AI tools like ChatGPT and Copilot are transforming the modern workflow. These tools promise high performance, but they're far from perfect; knowing their limitations is just as important as knowing their strengths. Right off the bat, these models don’t understand your data. They don’t connect predictions to your schema, business logic, or production constraints.
They generate plausible code or queries but often without validating against actual structures or edge cases. This is where an understanding of prompt engineering comes in which doesn’t just mean better phrasing but translating context into constraints the model can work with. Otherwise, you're just as likely to get broken logic as usable code.
Providing that structure up front significantly improves the accuracy of the output and this kind of precision is critical in data projects.
Then there’s precision. You could be using GenAI to craft transformation pipelines or writing code. But the problem with this is that generative AI often hallucinates, which means that it confidently suggests syntax, libraries, or functions that don’t behave as described or sometimes even don’t exist! This can be especially risky when you're deploying to production or relying on subtle transformations that impact business-critical logic.
That’s why you still need to vet the output carefully. Check the generated code against official documentation, test it in a sandbox, and validate the assumptions it's making. Even better, turn the AI into a research assistant. Ask it to cite its sources, link to relevant docs, or summarize the best practices from trusted repositories. Perhaps even ask the LLM to explain the rationale behind the code it generates. This not only helps you understand what it's trying to do, but also gives you a chance to spot gaps in its logic or mismatches with your data context before integrating anything into your pipeline.
They’re also stateless. Most models can’t track your session context or versioned data logic across interactions. Unless you carefully prompt, they’ll forget key constraints or project-specific naming conventions. A work around for statelessness is maintaining a session summary. This is a running list of decisions, assumptions, and outputs that you can paste into each new prompt to keep the model aligned.
Until LLMs gain persistent memory or better long-context performance, the burden of context management is on you. Being explicit pays off.
Finally, there’s trust. In data engineering, pipelines break when assumptions are wrong. You can’t just eyeball AI output, you need test coverage, validation, and deployment-aware thinking that these tools can’t yet offer. To work around this, treat any AI-generated code or config as a first draft, not production-ready logic, always assume it's incomplete. You can build unit tests to see how well the generated code performs. In addition, consider working in a virtual environment when testing AI-suggested code. It allows you to safely install and trial new dependencies without affecting your core environment or other projects.
Used well, generative AI can accelerate boilerplate, improve documentation, and even suggest alternatives. But it’s not a drop-in replacement for domain knowledge, testing discipline, or production-readiness.
Where Does AI Fail You?
Generative AI is everywhere and it’s not perfect. If you’ve ever been frustrated by code hallucinations, vague answers, or simply found that AI has a knowledge gap when it comes to your industry, we want to know. Help us map the real-world gaps in AI adoption by sharing your experience. We’ll publish the results (anonymized) in an upcoming issue on prompt engineering for data professionals.
JULY 16–18 | LIVE (VIRTUAL)
20+ ML Experts | 25+ Sessions | 3 Days of Practical Machine Learning and 35% OFF
Use CodeEARLY35at checkout
Learn Live fromSebastian Raschka,Luca Massaron,Thomas Nield, and many more.
⭕Extracting deeper insights with Fabric Data Agents in Copilot in Power BI: Microsoft announces the integration of Fabric Data Agents with Copilot in Power BI, enabling users to query multiple Fabric resources like lakehouses, warehouses, and KQL databases using natural language. This standalone Copilot experience enhances data discovery, insight extraction, and workflow efficiency.
⭕ Microsoft Fabric Spark: Native Execution Engine now generally available. The Fabric Spark Native Execution Engine (NEE) is now generally available in Fabric Runtime 1.3, offering up to 6× faster Spark workloads on lakehouses with no code changes. Built on Apache Gluten and Velox, NEE boosts Delta/Parquet processing with C++ vectorized execution.
⭕ Techniques for improving text-to-SQL: Google Cloud’s Gemini-powered text-to-SQL enables users to generate SQL from natural language, enhancing productivity across BigQuery, CloudSQL, AlloyDB, and more. Techniques like schema retrieval, intent disambiguation, self-consistency, and validation address challenges in context understanding, SQL dialects, and accuracy, ensuring high-quality SQL generation at scale.
⭕ How Looker’s semantic layer enhances gen AI trustworthiness: Looker’s semantic layer ensures trusted, consistent AI-driven business intelligence by grounding LLM responses in governed data models via LookML. It improves accuracy, reduces hallucinations, supports centralized metrics, and enhances natural language analytics, making conversational BI reliable, interpretable, and aligned with business definitions.
⭕ How Opendoor transformed business intelligence with Amazon QuickSight? Opendoor transformed its business intelligence by migrating to Amazon QuickSight, achieving 80% cost savings, faster dashboards, and self-service analytics for non-technical users. QuickSight’s SPICE engine, natural language querying via Amazon Q, and external embedding enhanced data accessibility, performance, and partner collaboration organization-wide.
⭕ What is embedded analytics? Metabase’s embedded analytics empowers SaaS apps to deliver interactive, white-labeled dashboards within their product. It supports self-service exploration, enhances user retention, reduces support requests, and enables scalable data access. With iframe, SDK, or custom builds, teams can integrate analytics with speed, security, and flexibility.
⭕ How to Sort Combo Box Values in a Power Apps Canvas App: This Power Apps guide shows how to sort Combo Box and Drop Down values using the Sort and Value functions for better user experience. Demonstrated with SharePoint Lists, it supports text and numeric fields, ensuring clean, user-friendly UI in Canvas Apps.
⭕ Top 5 Things You Should Know About Azure Data Factory: This onboarding guide outlines key Azure Data Factory (ADF) essentials, comparing ADF, Synapse, and Fabric Pipelines. It covers pipeline orchestration, dataflows, connection management, and billing complexities, stressing cost impacts of runtime and activity types. It’s crucial for teams migrating between Azure and Fabric.
⭕ MongoDB Atlas is Now Available as a Microsoft Azure Native Integration: MongoDB Atlas is now available as an Azure Native Integration (ANI), enabling seamless deployment, unified billing, and native access via the Azure Portal. This boosts AI-driven development, real-time analytics, and secure scalability, while simplifying operations through Azure-native tools, including Service Connector and Entra ID.