





















































Join cybersecurity thought leader David Linthicum for a special fireside chat to learn how to use AI and ML to unify your data strategies, uncover hidden cloud costs, and overcome the limitations of your traditional data protection in public cloud environments.
Sponsored
📬BIPro#96~ your trusted signal through the BI noise.
This week, we zoom in on how data teams are evolving: from the tools they use to the decisions about who should own them. As gen AI matures and enterprise data landscapes become more fragmented, the value of thoughtful orchestration and governance is more vital than ever.
Here’s what’s sparking conversations in this issue:
🔍 From Spreadsheets to Smart Agents
Build your own AI coding assistant with Ollama and Hugging Face inside JupyterLab, no cloud required.
Google’s Data Science Agent gets tested in the real world; can it really replace a data analyst?
🧠 Smarter, Cleaner Data Starts Here
Get hands-on with 10 Pandas One-Liners to clean up messy datasets fast.
Dive deep into BigQuery’s new Gemini-powered prep tools, now GA.
Learn how SQL Server’s new fuzzy search functions simplify approximate matching.
📈 BI Teams, Tools, and Tradeoffs
Who should own BI? IT ensures control, but business drives speed, find out why a hybrid model may be the future.
Follow Prime Video’s dashboarding overhaul with Amazon QuickSight: better governance, lower cost, happier teams.
🧰 Gen AI Meets Real-World Infrastructure
Discover how agents connect to Google Cloud databases securely and in real-time.
Understand AtScale’s Universal Semantic Layer, a game-changer for unified logic across BI tools.
Explore Colossus, Google’s not-so-secret storage engine delivering SSD performance at HDD prices.
⚡ Quick Wins & Industry Voices
Capital on Tap’s case study on data masking at scale using DataVeil.
Doris vs Elasticsearch, who wins on cost, speed, and scalability for real-time analytics?
And a new entry from Melissa on Snowflake Marketplace for instant data quality and enrichment.
Whether you're an engineer digging deep into data pipelines or a decision-maker chasing clarity, this issue gives you the sharpest tools, honest evaluations, and stories from the trenches.
Let’s sharpen your week with insights that matter.
Merlyn Shelley
Growth Lead, Packt
10 Pandas One-Liners for Data Cleaning: This article presents 10 concise pandas one-liners to clean messy datasets, tackling missing values, formatting errors, outliers, and inconsistent categories. From standardizing text and email formats to handling duplicates and validating data, these quick fixes simplify real-world data preparation using minimal code.
Understanding Database Consistency: This article explains database consistency models in distributed systems, including strong, eventual, causal, monotonic, and read-your-writes consistency. It covers their practical applications, trade-offs with availability and partition tolerance, and guides readers in choosing the right model for different real-world scenarios.
The future of dashboarding: Prime Video’s migration journey to Amazon QuickSight: Prime Video transformed its business intelligence by migrating from legacy BI tools to Amazon QuickSight. This shift improved performance, reduced costs, and enhanced data governance. Over two years, the team adopted a phased approach, enabling better scalability, automation, and faster decision-making across global teams.
AI-assisted BigQuery data preparation now GA: Gartner notes up to 94% of time in complex industries is spent preparing data. BigQuery data preparation, now generally available, uses Gemini to simplify and automate data wrangling. With visual pipelines, low-code tools, and Git integration, teams streamline transformations, ensure quality, and accelerate analytics workflows efficiently.
A Guide to Integrating ChatGPT with Google Sheets: This guide outlines how to integrate ChatGPT with Google Sheets using the GPT for Sheets add-on. It walks through installation, API setup, and practical use cases, from generating content to analyzing data, empowering users to automate tasks, personalize content, and streamline spreadsheet workflows using AI.
Doris vs Elasticsearch: A Comparison and Cost Case Study. This article compares Apache Doris and Elasticsearch for real-time analytics and search. Doris excels in complex queries, SQL support, and cost efficiency, while Elasticsearch leads in full-text search. A Tencent Music case study shows Doris reduced storage by 70% and boosted performance, making it a strong alternative for scalable analytics.
Accelerate operational analytics with Amazon Q Developer in Amazon OpenSearch Service: Amazon Q Developer now integrates with Amazon OpenSearch Service, allowing users to explore and visualize operational data using natural language. It simplifies alert investigation, speeds up incident resolution, and supports AI-generated summaries, anomaly detection, and dashboard creation, making observability more accessible and reducing time spent on manual troubleshooting.
Implementing Fuzzy Search in SQL Server Using New Inbuilt Functions: Microsoft SQL Server now supports built-in fuzzy search functions like EDIT_DISTANCE and JARO_WINKLER_SIMILARITY, enabling developers to handle name variations and typos directly within T-SQL. These functions improve search accuracy, reduce external tool reliance, and simplify approximate matching across large datasets, especially useful for user-facing or record-matching applications.
Google’s Data Science Agent: Can It Really Do Your Job? Google’s Data Science Agent, now built into Colab, automates data workflows from EDA to model building using natural language prompts. While it speeds up analysis and corrects errors on the fly, it struggles with iterative edits and nuanced decision-making. It’s a helpful tool, but not yet a full data scientist replacement.
How Colossus optimizes data placement for performance: Google’s Colossus storage system powers services like Gmail, YouTube, and BigQuery, offering SSD-like speed at HDD costs. With innovations like L4-based SSD caching and writeback, Colossus dynamically places hot data on SSDs. This adaptive approach boosts IOPS and throughput while minimizing costs, supporting massive scale without user-side complexity.
Build Your Own AI Coding Assistant in JupyterLab with Ollama and Hugging Face: This guide walks through building a private AI coding assistant in JupyterLab using Jupyter AI, Ollama, and Hugging Face. It enables offline coding support, including error fixing, autocompletion, and code generation. Running models locally boosts privacy and responsiveness, ideal for developers seeking control without relying on the cloud.
Capital on Tap Meeting Regulatory Compliance and Explosive Growth with DataVeil Data Masking: Capital on Tap used DataVeil to protect sensitive data and meet privacy laws like GDPR and ISO 27001. With 60 databases and fast growth, they needed a way to mask data for testing without exposing real information. DataVeil offered automation, consistency, and ease of use, saving time and keeping them compliant.
Who Should Own the Business Intelligence Team - IT or Business? Should the BI team report to IT or the business? IT offers strong governance and technical expertise, while business-led teams move faster and deliver more relevant insights. The best approach is a mix: a central BI team ensures standards and data quality, while business teams focus on their specific needs.
Unlock Instant Data Quality and Data Enrichment on Snowflake Marketplace: Snowflake Marketplace now offers instant access to Melissa’s 23 data products, tools and datasets that help clean, verify, and enrich customer data directly in Snowflake. With no complex setup required, businesses can quickly improve data quality, reduce fraud, and drive better decisions through native apps for email, phone, address, and demographic verification.
Unified, Cost-Effective Text-to-SQL and Business Intelligence with the AtScale Semantic Layer: AtScale’s Universal Semantic Layer helps organizations deliver consistent, cost-effective data access across tools like Power BI, Excel, and Text-to-SQL platforms. By standardizing business logic across diverse data sources, it eliminates duplicated metrics, reduces data silos, and improves performance, without needing new ETL pipelines. This approach ensures accurate, real-time insights for both technical and business users.
Learn how to connect agents to Google Cloud databases: Google Cloud now offers tools to build advanced AI agents that connect directly to databases for real-time, secure data access. With the open-source Gen AI Toolbox for Databases, developers can streamline connections to Google Cloud and open-source databases. This enables agents to query data using natural language, handle complex workflows, and work across graph, vector, and text data models, helping enterprises create smarter, scalable gen AI applications.