DataPro

05 Jun 2025

11 min read

Claude Code + Amazon Bedrock Prompt Caching, Mistral Code, Snowflake’s Cortex AISQL, Google Cloud’s Lightning Engine + Vertex AI Ranking API

05 Jun 2025

Google’s new MCP Toolbox for Databases streamlines AI-assisted devSubscribe | Submit a tip | Advertise with usWelcome to DataPro 138, where graphs aren’t just visuals, they’re the future of machine learning. Where maps aren’t static, they’re smart, dynamic tools. And where every scroll brings you closer to mastering the bleeding edge of data, AI, and analytics.🔍 AI Breakthroughs You Need to KnowThis month’s top research drops, and product releases are setting the stage for next-gen AI development:OpenAI's new agent stack makes voice agents more transparent, auditable, and real-time.Shanghai AI Lab cracks RL entropy collapse with Clip-Cov and KL-Cov — boosting LLM reasoning.Snowflake’s Cortex AISQL brings AI-native analytics straight into your SQL.Mistral Code enters the AI dev chat with full-stack, enterprise-ready coding support across 80+ languages.📘 Graph Machine Learning, Second Edition – Reinvent Your ML StackForget flat data. The world is connected, and your models should be too. The newly updated Graph Machine Learning dives deep into graph-native thinking with:PyTorch Geometric integrationFresh chapters on LLMs and temporal graphsReal-world use cases across healthcare, enterprise AI, and moreWhether you're building models for fraud detection or brain data analysis, this is your leap forward.🗺️ Learn QGIS, Fifth Edition – Spatial Thinking Starts HereIf QGIS has ever felt like deciphering an alien control panel… this book is your Rosetta Stone. The Fifth Edition of Learn QGIS is built for curious beginners and seasoned pros alike, offering:Step-by-step guidance from install to field-ready mobile appsPowerful map visualizations and spatial analyticsAutomation with Python, ethical GIS practices, and moreIt’s not just a manual. It’s a mentor in book form, authored by the legends of the QGIS ecosystem.💬 What the Data World’s Talking AboutFrom DuckDB pipelines to Claude-powered code boosts, and Jupyter grads leveling up to full-stack devs -this edition is packed with practical takeaways, including:How to use LLMs + Pandas for executive data summariesWhy decision trees need smarter encoding strategiesHow data drift monitoring is broken, and how to fix it🧠 Case Studies & Cloud Innovations from the Tech TitansGoogle, AWS, and Snowflake just raised the bar on AI-integrated workflows:Google Vertex AI Ranking API tackles noisy RAG systemsLightning Engine supercharges Apache Spark queries by 3.6xAWS Agentic AI makes cloud migration smarter and faster than everSponsored🔐 Mobile App SecurityFuture-proof your app.Discover how your mobile app can evolve automatically, leaving reverse engineers in the dust with every release.👉Register Now🤖 AI Side HustleEarn up to $50/hr building your AI skills, no experience needed!💰 Competitive Pay | ⏰ Flexible Schedule | 🚀 Remote & Beginner-Friendly👉Apply NowTL;DR: Graph ML is getting smarter. Geospatial data is going mainstream. And AI tooling is evolving faster than ever. Whether you’re coding smarter, mapping clearer, or just trying to stay ahead - DataPro 138 is your unfair advantage.👉 Ready to dive in? Let’s explore the future of data, together.Cheers,Merlyn ShelleyGrowth Lead, PacktBuild Your Own AI Agents Over The WeekendJoin the live"Building AI Agents Over the Weekend"Workshop starting onJune 21stand build your own agent in2 weekend.In this workshop, the Instructors will guide you through building a fully functional autonomous agent and show you exactly how to deploy it in the real world.BOOK NOW AND SAVE 25%Use CodeAGENT25at checkoutTop Tools Driving New Research 🔧📊🔶 OpenAI Introduces Four Key Updates to Its AI Agent Framework: OpenAI just dropped a major upgrade to its AI agent stack: TypeScript SDK support, real-time voice agents with human-in-the-loop control, full traceability for voice sessions, and smoother speech-to-speech interactions. These updates make agents easier to build, audit, and deploy across web, server, and multimodal voice apps. 🔶 From Exploration Collapse to Predictable Limits: Shanghai AI Lab Proposes Entropy-Based Scaling Laws for Reinforcement Learning in LLMs. Reinforcement learning for reasoning-centric LLMs just got a breakthrough: Researchers tackled the entropy collapse bottleneck by modeling the entropy-performance link and introducing Clip-Cov and KL-Cov, two novel strategies that sustain exploration during RL. Tested on top open-source models, these techniques deliver major performance gains.🔶 Snowflake Charts New AI Territory: Cortex AISQL & Snowflake Intelligence Poised to Reshape Data Analytics. Snowflake just redefined data-AI synergy: At the Snowflake Summit, they unveiled Cortex AISQL and Snowflake Intelligence, two new tools that embed AI into SQL workflows and enable natural language data queries. These innovations make advanced analytics intuitive for both analysts and business users, signaling a major leap in accessible enterprise AI.🔶 Mistral AI Introduces Mistral Code: A Customizable AI Coding Assistant for Enterprise Workflows. Mistral AI enters the enterprise dev arena with Mistral Code: Their new coding assistant prioritizes security, on-prem deployment, and tunability to internal codebases. Backed by four specialized models, it supports full-stack workflows—debugging, refactoring, and more, across 80+ languages. With partners like Capgemini onboard, it’s built for real-world, regulated environments.📘 Graph Machine Learning, Second Edition – ML’s Next Leap Starts HereThe future of ML is graph-native,and this book puts you ahead of the curve.Fully updated with PyTorch Geometric, new chapters on LLMs and temporal graphs, and expert-backed case studies, it’s your guide to building smarter, more dynamic models.👉 Preorder now and stay ahead while others catch up.🚀 Why it matters:Practical, production-ready techniquesModel real-world complexity with graph structuresCombine graph theory + LLMs for deeper insights20% off print / 50% off eBook - ends June 10👨‍🔬 Meet your expert guides:Aldo Marzullo – PhD in deep learning + graph theory for brain data Enrico Deusebio – Data science lead building enterprise AI systems Claudio Stamile – Biomedical AI specialist with ML + graph expertiseBuy Print at $43.98$54.99Buy ebook at $21.99$43.99Topics Catching Fire in Data Circles 🔥💬🔶 Data Science ETL Pipelines with DuckDB: ETL just got easier for data scientists with DuckDB: This open-source, in-memory SQL engine streamlines data pipelines, from extracting and transforming raw datasets to loading them into cloud warehouses like Motherduck. With seamless SQL and Pandas support, you can efficiently prep data for analysis, modeling, and beyond, all from your IDE.🔶 Unlocking Your Data to AI Platform: Generative AI for Multimodal Analytics: SQL meets multimodal AI in the modern data warehouse: Traditional platforms are evolving, now integrating generative AI to natively analyze text, images, and PDFs alongside structured data. With tools like BigQuery’s AI.GENERATE and ObjectRef, analysts can now ask nuanced, semantic questions using pure SQL, no external ML pipelines or prompt engineering required.🔶 The Journey from Jupyter to Programmer: A Quick-Start Guide. From notebook to production: why it’s time to graduate from Jupyter. This guide unpacks how transitioning from .ipynb files to modular Python scripts empowers data scientists with structure, scalability, and team collaboration. With tools like Cookie Cutter, VS Code, and best practices like if __name__ == '__main__', you’re coding like a pro.🔶 Supercharge your development with Claude Code and Amazon Bedrock prompt caching: Claude Code + Amazon Bedrock prompt caching is now live: Anthropic’s AI coding assistant, Claude Code, now leverages Bedrock’s prompt caching to cut token costs and speed up coding workflows, especially in large, iterative projects. With support for Model Context Protocol, it’s enterprise-ready, secure, and optimized for real-world software development on AWS.If You’ve Ever Googled “How to Map in QGIS”… This Is Your Sign.Every now and then, a tech book shows up that doesn’t just teach a tool, it redefines how you think about the problem. Learn QGIS, Fifth Edition is exactly that kind of book. It’s not a recycled walkthrough. It’s a no-fluff, deeply practical guide to working with geospatial data like a modern pro, even if you’re just getting started. Whether you're wrangling satellite data or just trying to make sense of your city's zoning chaos... this book has your back.But wait, what even is QGIS?QGIS blends the power of Excel with the spatial smarts of Google Maps, plus the logic of environmental science, urban planning, and Python. It’s a leading open-source GIS tool used by governments, researchers, and analysts. But learning it solo? Confusing and overwhelming. This guide makes it simple. From install to building a mobile-ready GIS app, this guide takes you from “Where do I start?” to “Look what I built.”Meet the Dream Team Behind the BookEugenia Sarafova – GIS professor, TEDx speaker, remote sensing PhD, and cartography content machine. She’s guided countless learners through the maze of mapmaking with clarity and confidence.Ivan Ivanov – Core contributor to QGIS, QField, and QFieldCloud. When we say “hands-on,” we mean he literally built the tools.Andrew Cutts – He breaks down complex geospatial stuff until you wonder why you ever found it hard.Anita Graser – A QGIS veteran and community icon, Anita’s work has guided thousands through the open-source geospatial jungle.This book is built for people solving real-world problems, not just collecting certifications. It’s fully updated for QGIS 3.38, QField, open data workflows, and AI tools, so you're learning what actually works from the experts shaping the future of GIS. If your work touches the physical world, spatial thinking leads to better decisions. Learn QGIS, Fifth Edition helps you master it, one hands-on chapter at a time. Now available for pre-order- Click Here to Buy.New Case Studies from the Tech Titans 🚀💡🔶 New MCP integrations to Google Cloud Databases: Google’s new MCP Toolbox for Databases streamlines AI-assisted dev: Now GA, Toolbox connects Claude Code, Cursor, and other AI agents directly to databases like BigQuery, AlloyDB, and Cloud SQL. Developers can query, refactor, and generate tests with simple natural language, all within their IDE. Schema changes? Test updates? Just prompt and go.🔶 Launching our new state-of-the-art Vertex AI Ranking API: Google launches Vertex AI Ranking API to fix noisy search and flaky RAG: With up to 70% of retrieved content often irrelevant, this precision reranker improves answer quality, speeds up AI agents, and cuts costs. It integrates easily with legacy search, RAG, or tools like AlloyDB, LangChain, and Elasticsearch, so you get better results in minutes.🔶 Introducing Lightning Engine for Apache Spark: Google Cloud unveils Lightning Engine to supercharge Apache Spark: Now in preview, this next-gen engine boosts query performance up to 3.6x with advanced optimizations from scan reduction to columnar shuffle. Built on Velox and Gluten, it integrates seamlessly with Iceberg, Delta Lake, BigQuery, and GCS, delivering faster insights and lower costs without rewriting code.🔶 AWS Agentic AI Options for migrating VMware based workloads: AWS streamlines VMware migrations with agentic AI: AWS Transform for VMware accelerates rehost planning by 80x, auto-translating networking configs and sizing EC2 workloads. For complex migrations, Amazon Bedrock enables multi-agent orchestration with deep domain expertise, MCP integrations, and traceability. Use both tools to blend speed and sophistication across your cloud migration strategy.Blog Pulse: What’s Moving Minds 🧠✨🔶 Building a Modern Dashboard with Python and Gradio: Gradio makes building interactive dashboards refreshingly simple: This guide walks through creating a polished sales performance dashboard using a CSV file and Python, complete with date filters, key metrics, visualizations, and raw data views. With minimal setup, Gradio offers a lightweight, flexible way to turn data into insights without heavy front-end code.🔶 Decision Trees Natively Handle Categorical Data: Decision trees handle categories just fine, until they don’t: While DTs natively split on categorical features, high cardinality makes training slow. Mean Target Encoding (MTE) elegantly sidesteps this by reducing the number of splits from exponential to linear, without sacrificing accuracy. Empirical tests confirm: MTE delivers the same split, but exponentially faster.🔶 LLMs + Pandas: How I Use Generative AI to Generate Pandas DataFrame Summaries. Tired of manually analyzing massive datasets? This guide shows how to pair Pandas with local LLMs (via Ollama) to generate polished executive summaries from raw data, no need to leave your machine or break the bank. With one-time setup, you can transform data insights into clean, readable reports in seconds.🔶 Data Drift Is Not the Actual Problem: Your Monitoring Strategy Is. Data drift isn’t the real threat, misinterpreting it is: In ML systems, drift is often treated as a red flag, but it's just a signal. Without context, statistical monitoring can trigger false alarms or worse, blind spots. A robust strategy layers statistical, contextual, and behavioral monitoring to answer what really matters: does the drift affect outcomes?See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}.reverse{display:table;width: 100%;

0
0

DataPro

Merlyn from Packt

25 Jun 2025

8 min read

Microsoft Presidio, Amazon Bedrock + Arize Phoenix for Agent Observability, No-Code Forecasting with SageMaker Canvas

Merlyn from Packt

25 Jun 2025

8 min read

Multi-Agent KYC with Google’s ADK, Inside MiniMax-M1: A New Long-Context RL FoundationBecome an AI Generalist that makes $100K (in 16 hours)AI isn’t the future — it’s the present, quietly reshaping work, money, and opportunity. McKinsey says AI is set to add $13Trillion to the economy by 2030 — but also replace millions of jobs. Will you use it to get ahead, or get left behind? Don’t worry here’s exactly what you need: Join the World’s First 16-Hour LIVE AI Mastermind for professionals, founders, consultants & business owners like you.Rated 4.9/5 by 150,000 global learners – this will truly make you an AI Generalist that can build, solve & work on anything with AI.In just 16 hours & 5 sessions, you will:✅ Learn the basics of LLMs and how they work.✅ Master prompt engineering for precise AI outputs.✅ Build custom GPT bots and AI agents that save you 20+ hours weekly.✅ Create high-quality images and videos for content, marketing, and branding.✅ Automate tasks and turn your AI skills into a profitable career or business.All by global experts from companies like Amazon, Microsoft, SamurAI and more. And it’s ALL. FOR. FREE. Join now and get $5100+ in additional bonuses: 🔥$5,000+ worth of AI tools across 3 days — Day 1: 3000+ Prompt Bible, Day 2: $10K/month AI roadmap, Day 3: Personalized automation toolkit.Attend all 3 days to unlock the cherry on top — lifetime access to our private AI Slack community!Register Now (free only for the next 72 hours)SponsoredSubscribe|Submit a tip|Advertise with usWelcome to DataPro 140 – Where Breakthrough AI Meets Practical Problem-SolvingTired of demos and theoretical fluff? From no-code forecasting to long-context AI, this week’s roundup dives into how today’s most compelling tools are reshaping what’s possible, without requiring you to reinvent your stack. Whether you're rethinking compliance with agentic workflows, streamlining data prep with natural language, or scaling models without breaking compute, these stories explore the friction points data teams face, and how smart engineering is solving them. Let’s get into what’s moving the space forward👇🔍 This Week’s Top Drops[Build AI Workflows with n8n + LLMs]Launch intelligent automations, daily briefs, customer bots, schedulers, without writing complex code.[Magenta RealTime: Music Meets LLMs]Google's open model lets you generate music live using SpectroStream and a transformer backbone.[MiniMax-M1: A 456B Long-Context Model]Crush reasoning bottlenecks with 1M-token context and lightning-fast attention, optimized for real-world use.[DSPy: Program AI, Don’t Just Prompt]Treat LLM workflows like code: structured logic, modules, and debug-ability built right in.[KYC Agents with Google’s ADK + Gemini]Skip the manual drudgery, automate onboarding with grounded search, sub-agents, and BigQuery.[Amazon Bedrock + Arize: Agent Observability]Gain full visibility into AI agent behavior, tool calls, and accuracy with production-grade insights.[Presidio for PII Detection + Hashing]Anonymize names, numbers, even custom IDs, safely, consistently, and at scale with Microsoft Presidio.[PyBEL for Bio Knowledge Graphs]Map disease pathways and protein interactions with this powerful toolkit for causal graph building.Whether you’re building agentic pipelines or anonymizing sensitive data, this week’s roundup proves you’re only ever a prototype away from production.Cheers,Merlyn ShelleyGrowth Lead, PacktJoin us on July 19 for a 150-minute interactive MCP Workshop. Go beyond theory and learn how to build and ship real-world MCP solutions. Limited spots available! Reserve your seat today.Use Code EARLY35 for 35% offTop Tools Driving New Research 🔧📊🔵 Building AI-Powered Low-Code Workflows with n8n: Discover how to automate personal and business tasks using n8n, a low-code platform with built-in AI. This blog walks through building three useful workflows: a daily briefing assistant, customer support bot, and appointment scheduler, while addressing prompt injection, memory setup, and alternatives for creating intelligent, efficient systems without heavy technical effort.🔵 google/magenta-realtime: Explore Magenta RealTime, Google’s open music generation model designed for real-time audio creation. Licensed under Apache 2.0 and CC-BY 4.0, it enables interactive music workflows using components like SpectroStream, MusicCoCa, and a transformer LLM. It supports live performance, education, and research, while outlining usage terms, risks, and limitations.🔵 tencent/Hunyuan3D-2.1: Get to know Hunyuan3D 2.1, a high-fidelity 3D asset generation framework from images, designed with production-ready PBR materials. Developed by Tencent, it builds on scalable diffusion models and supports text-to-3D and image-to-3D workflows. Backed by multiple arXiv publications, the project acknowledges open-source contributions and promotes reproducibility through public citation and benchmarking.🔵 MiniMaxAI/MiniMax-M1-80k: Tackle complex reasoning and long-context challenges with MiniMax-M1, a purpose-built open-weight model for data professionals. Designed with a 1M-token context window and lightning-efficient attention, it excels in software engineering, tool use, and advanced problem-solving, making it a reliable foundation for building next-gen AI applications in practical, high-stakes environments.Topics Catching Fire in Data Circles 🔥💬🔵 Data Has No Moat! Rethink data's role in the AI era. While powerful models grab headlines, this piece makes a compelling case for data as the true competitive moat. From poisoning risks to quality loops, it outlines why responsible, curated, and well-governed data is still the foundation of any trustworthy AI system that lasts.🔵 Agentic AI: Implementing Long-Term Memory. Build better LLM applications by implementing long-term memory, because short-term hacks won't scale. This piece breaks down practical strategies for data professionals, from hybrid search to knowledge graphs, and weighs open-source and vendor tools. It’s a clear guide for designing memory systems that reduce hallucinations and support reasoning over time.🔵 Programming, Not Prompting: A Hands-On Guide toDSPy. Move beyond fragile prompting with DSPy, a framework that treats LLM workflows like real programming. This hands-on guide shows how to build AI apps using DSPy modules, structure logic with signatures, and boost reliability through instruction optimization. For data professionals, it's a smarter way to design, debug, and scale GenAI systems.New Case Studies from the Tech Titans 🚀💡🔵 Amazon Bedrock Agents observability using Arize AI: Monitor and improve AI agents with the Amazon Bedrock–Arize Phoenix integration. Gain full traceability of agent decisions, evaluate tool call accuracy, and optimize performance with structured insights. This setup simplifies debugging, enhances reliability, and supports production-scale deployment, key for building transparent, efficient, and trustworthy generative AI applications end-to-end.🔵 No-code data preparation for time series forecasting using Amazon SageMaker Canvas: Prepare time series data without writing code using Amazon SageMaker Canvas and Data Wrangler. Import datasets, clean and transform data with natural language or visual tools, and resample for forecasting. With built-in security, validation, and modeling, this no-code workflow streamlines time series forecasting from raw CSV to predictive model in minutes.🔵 Gemini 2.5 Updates: Flash/Pro GA, SFT, Flash-Lite on Vertex AI: Build and scale confidently with Gemini 2.5 now on Vertex AI. Gemini 2.5 Flash and Pro are production-ready, with Flash-Lite and audio-capable Live API in preview. Get speed, reasoning, and fine-tuning for custom workflows. With full observability, multimodal depth, and real-world testimonials, this release levels up enterprise AI development.🔵 Build KYC agentic workflows with Google’s ADK: Streamline KYC with a multi-agent workflow using Google’s Agent Development Kit, Gemini models, Search Grounding, and BigQuery. This three-step guide shows how to orchestrate document checks, resume verification, and wealth analysis using agent tools and grounded search, boosting accuracy, automation, and auditability for financial institutions aiming to modernize compliance with AI.Blog Pulse: What’s Moving Minds 🧠✨🔵 Getting Started with Microsoft's Presidio: A Step-by-Step Guide to Detecting and Anonymizing Personally Identifiable Information PII in Text. Learn to detect and anonymize PII in free text using Microsoft Presidio. This hands-on guide walks through installing Presidio, recognizing standard and custom entities, applying anonymizers like hashing and reanonymization, and maintaining consistent outputs. With spaCy integration and reusable mappings, it’s a practical toolkit for responsible data handling in NLP workflows.🔵 A Coding Implementation for Creating, Annotating, and Visualizing Complex Biological Knowledge Graphs Using PyBEL. Use PyBEL to model complex biological systems like Alzheimer’s pathways through causal graph construction, network analysis, and custom visualization. This tutorial guides you through defining proteins and processes, analyzing node centrality, querying paths, and mining literature evidence, all in Google Colab, laying a strong foundation for biological knowledge graph exploration and enrichment.🔵 MiniMax AI Releases MiniMax-M1: A 456B Parameter Hybrid Model for Long-Context and Reinforcement Learning RL Tasks. MiniMax-M1 is a 456B open-weight hybrid model built for long-context and reinforcement learning tasks. With 1M-token context, lightning-fast attention, and efficient RL via the CISPO algorithm, it reduces compute cost while excelling in software engineering and agent tool use. A scalable, transparent breakthrough for real-world reasoning applications.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0

DataPro

Merlyn from Packt

16 Oct 2025

10 min read

Behind the Book: How Eric Narro’s Taipy Journey Began

Merlyn from Packt

16 Oct 2025

10 min read

The Making of Getting Started with Taipy — Lessons for Every Data EngineerMaster AI in 16 hours & Become Irreplaceable before 2025 ends! 🚀The spooky season is here—candies, costumes, and fear is everywhere.But the real nightmare? It's not ghosts—it's job lossOver 71% of people believe AI will take their jobs by 2025. The anxiety is real: Are you good enough? Fast enough? Smart enough?But here's your treat this Halloween—a real solution to end the fearJoin the online 2-Day LIVE AI MASTERMIND by Outskill - a hands-on intensive training designed to make you an AI powered professional who can learn, earn and Build with AI.Usually $395, but as a part of their halloween sale 🎃, you can get in for completely FREE!Rated 9.8/10 by Trustpilot– an opportunity that makes you an AI Generalist who can build, solve & work on anything with AI, instead of fearing it.In just 16 hours & 5 sessions, you will:✅ Build AI agents that save up to 20+ hours weekly and turn time into money✅ Master 10+ AI tools that professionals charge $150/hour to implement✅ Launch your $10K+ AI consulting business in 90 days or less✅ Automate 80% of your workload and scale your income without working more hoursLearn strategies used by the biggest giants like Google, Amazon, Microsoft from their practitioners 🚀🔥🧠Live sessions- Saturday and Sunday🕜10 AM EST to 7 PM EST🎁 You will also unlock $5000+ in AI bonuses: prompt bibles 📚, roadmap to monetize AI 💰 and your personalised AI toolkit builder ⚙️️ — all free when you attend!Join in now, (we have limited free seats!)SponsoredSubscribe|Submit a tip|Advertise with UsWelcome to DataPro #154!Behind the Book: Eric Narro and the Making of Getting Started with TaipyThis edition continues from last week’s article by Eric Narro, where he explored how to bring your Python models to life with Taipy. Today, we’re going behind the scenes to see how his book Getting Started with Taipy evolved, from an idea sparked at PyCon France to a full-fledged Packt title.Eric’s journey is a story of curiosity, persistence, and the spark that turns passion projects into published work. As an Analytics Engineer, he not only builds data applications but also helps others bridge the gap between prototyping and production.If you missed last week’s piece, From Time Series to Chatbots: Bring Your Python Models to Life with Taipy, it’s worth a read — especially if you’re curious about how Taipy enables developers to turn data, models, and algorithms into dynamic, user-ready applications.And here’s a little something extra: for one week only, you can grab Getting Started with Taipy at 30% off (ebook) and 10% off (print).Let’s dive into the story behind the book and the lessons that shaped it.Cheers,Merlyn ShelleyGrowth Lead, PacktSponsored: Join Snyk on October 22, 2025 at DevSecCon25 - Securing the Shift to AI NativeJoin Snyk October 22, 2025 for this one-day event to hear from leading AI and security experts from Qodo, Ragie.ai, Casco, Arcade.dev, and more!The agenda includes inspiring Mainstage keynotes, a hands-on AI Demos track on building secure AI, Snyk's very FIRST AI Developer Challenge and more! Save your spot now.How I Got to Write a Book with Packt | My Personal Experience by Erric NarroIf you don’t have a Medium account, you can read this storyhere.Wow,I wrote a book!About a Python library, of all things. The book is calledGetting Started with Taipy,if you want to take a look at it.Would you have told me, 4 years ago: “Eric, you’re going to write a book about computer science or data topics, and you’ll actually be apublished author,” I’d have laughed and said, “Come on, stop lying to me!”But here we are.In this article, I want to sharehow I ended up writing a book: the story behind the opportunity, the twists and turns that led me there, and the lessons I picked up along the way. In a follow-up post, I’ll dive deeper into what it was really like to go through the writing and publishing process.The motivation for this article is to eventually motivate anyone reading this to take action and work hard towards their goals, whatever they are. A second motivation is to show how working towards your goals may end up giving you unexpected results (I never thought of writing a book before I was asked to do it!)I’m 38 as I write these lines. I guess any story behind any personal outcome could be traced back to birth, but don’t worry, I won’t torture you with a detailed overview of my past! Still, there’s a wholewaterfall of eventsthat led me here, and that’s what I want to write about.How I Became a Data AnalystFor a living, I have a job. I’m a data analyst. Although, to be honest, I actually do more of a data engineering role these days (with ETL tasks, integrating data into databases, and so on, it’s quite diversified).I’ve written before aboutthe path I took to becoming a data analyst, but to summarize: I was a vineyard technician for 8 years, I learned Python to build my own tools, eventually I learned programming more extensively in college with distance studies and through a number of Coursera courses, and I effectively changed careers after sometime programming both at my work, or by doing personal projects.It took me a long time to take the step of changing careers, in part because I started to learn programming (and program effectively) as a way to improve and automate tasks at my former job (which I didn’t dislike); also, at some point, there was some lack of confidence to make the switch. But over time, I realized that I loved programming and working with data even more than being a vineyard technician, and that gave me the push I needed.Before officially changing careers, I had already learnedversion control, built a small GitHubportfolio(with README files and documentation about them). I also gained a foundation inSQL. I dabbled in web development withPHPandMySQL. I knew how to deal withLinux systems. During college, I had programmed inC, Bash, LISP, JavaScript, and Prolog, and I knew my way around Boolean logic, encoding, and binary calculations. I also knew some about statistics, analytical workflows, and data warehousing.The reason I mention all this is: it was quite odd that I became a data analyst in the first place;I found that passion along the way. But at the same time, it wasn’t a miracle either;it was the result of hard work, and ultimately, it was the result of other people giving me a chance, based on what I was able to share with them. That is why it’s important tojust do things. Efforts will end up paying in one way or another. Byjust doing things, you’ll eventually end up doing too many things, and then you’ll have to choose which one… but until then, just start doing something you like!And well, I also mention this because there’s no way I would have found out about Taipy, which is a Python library, had I not been proficient with Python!How I Discovered TaipyMy first (and current) position as a data analyst was (and is) for a contracting company. My client — then and now — is an insurance company located near Nice. Americans like to call that area theFrench Riviera, which sounds super fancy… but I was working remotely from Bergerac, which, let’s be honest, sounds more like a guy with a big nose. Jokes aside, Bergerac is a small town not far from Bordeaux.And once again, things got a little odd.It was February 16, 2023. I was at work (well, working from home), researching a Python library, when an ad popped up on the documentation site that caught my attention: that very weekend,PyCon France was taking place in Bordeaux. I hadn’t heard about it before, but I immediately booked a hotel night and decided to go.Had I not seen that small ad, or had the conference been in some other city, I wouldn’t have attended it.Not only did I learn a ton at the conference, but it also gave me plenty ofmaterial to write about on Medium, where I had just started publishing. The timing couldn’t have been more perfect. So yes, it was odd, but also a reminder that luck tends to meet you halfway,afteryou’ve put in the work. I attended PyCon as a junior programmer, sure, but a programmer nonetheless!Continue reading the full article on our Packt Medium Handle here.Here’s a quick throwback to last week’s piece for those who didn’t get a chance to read it.Meet Taipy: A Pure-Python, Fast, and Scalable Application BuilderBy Eric NarroFrom Time Series to Chatbots: Bring your Python Models to Life with TaipyTaipy is a Python application builder with one clear promise:deploy your data applications in real production environments. It’s the ideal tool for creating scalable, interactive apps that bring your models, analytics, and algorithms to life. Whether you’re building dashboards, optimization tools, or AI-powered chatbots, Taipy helps data professionals turn prototypes into powerful, end-user applications. WithGetting Started with Taipy, you’ll learn how to build complete applications from the ground up, deploy them confidently, and explore real-world examples and advanced use cases that showcase Taipy’s full potential.Python has long been the go-to language for data professionals, not because they’re developers, butbecause Python makes complex work accessible.Analysts, data scientists, and AI engineers use it to model data, run analytics, and visualize results.But when it comes to turning those models into real applications for end users, things get tricky. Building a web app the traditional way, with backend frameworks, databases, and front-end stacks, is often out of reach for data teams. It demands skills, time, and coordination that slow everything down and increase costs.Tools like Power BI or Tableau help visualize data, but they can’t trulyrunPython code or offer the flexibility of a full application. Python frameworks like Streamlit, Dash, Panel, or Gradio solve the problem partially. Each has trade-offs. To give an example, Streamlit is a great library for prototyping: it’s very easy to learn, and you can create demos in no time. While you can take Streamlit applications to production, they are harder to scale because they don’t optimize the way code runs, and they run on their own server (you can’t run them in a WSGI server). What this means is you can create useful applications for end users if they make limited use of the app, or if you don’t need to process large amounts of data.That’s where Taipy comes in!Taipy lets you create scalable, production-grade applications directly in Python.Whether for time series, optimization, geospatial analysis, or even LLM chatbots, Taipy is designed for performance and scalability.You can deploy Taipy apps on WSGI servers, handle multiple users efficiently, and still build everything using pure Python.Continue reading the full article on our Packt Medium Handle here.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0

DataPro

Merlyn from Packt

11 Jun 2025

13 min read

10,000x Faster Bayesian Inference, OpenAI on Countering Malicious AI, MCP integrations to Google Cloud Databases, MLOps Pipeline with Tekton and Buildpacks

Merlyn from Packt

11 Jun 2025

13 min read

Gemini-Powered DataFrame Agent for Natural Language Data Analysis with Pandas and LangChain, VisualYour Exclusive Invite for the World’s first 2-day AI Challenge (usually $895, but $0 today)51% of companies have started using AITech giants have cut over 53,000 jobs in 2025 itselfAnd 40% of professionals fear that AI will take away their job.But here’s the real picture - companies aren't simply eliminating roles, they're hiring people who are AI-skilled, understand AI, can use AI & even build with AI. Join the online 2-Day LIVE AI Mastermind by Outskill - a hands-on bootcamp designed to make you an AI-powered professional in just 16 hours. Usually $895, but for the next 48 hours you can get in for completely FREE!In just 16 hours & 5 sessions, you will:Learn the basics of LLMs and how they workMaster prompt engineering for precise AI outputsBuild custom GPT bots and AI agents that save you 20+ hours weeklyCreate high-quality images and videos for content, marketing, and brandingAutomate tasks and turn your AI skills into a profitable career or businessKick off Call & Session 1- Friday (10am EST- 1pm EST)Sessions 2-5:Saturday 11 AM to 7 PM EST; Sunday 11AM EST to 7PM ESTAll by global experts from companies like Amazon, Microsoft, SamurAI and more. And it’s ALL. FOR. FREE. You will also unlock $3,000+ in AI bonuses: Slack community access, Your Personalised AI tool kit, and Extensive Prompt Library with 3000+ ready-to-use prompts - all free when you attend!Join in now, we have limited free seats!SponsoredSubscribe|Submit a tip|Advertise with usWelcome to DataPro #138 - Where AI Acceleration Meets Practical InsightThis week’s edition dives into the cutting edge of data science, AI tooling, and intelligent automation, highlighting breakthroughs that are reshaping how we build, reason, and scale.From a staggering 10,000x speed-up in Bayesian inference to OpenAI’s battle against malicious AI use, this issue captures the pulse of innovation across MLOps, LLM infrastructure, and trustworthy deployment. Google’s new MCP Toolbox integrations promise seamless AI-assisted development on Cloud Databases, while Tekton and Buildpacks simplify model automation with no Dockerfile in sight.We also explore research frontiers, from advanced molecular design powered by ether0’s RL-tuned 24B model, to VeBrain’s leap in embodied AI, letting language models perceive, reason, and act in physical environments. On the tooling side, Alchemist shows how to distill open datasets into generative gold, and Meta’s LlamaRL raises the bar on scalable RL fine-tuning for LLMs.Looking ahead, our preview spotlights a Gemini-powered Pandas agent capable of transforming natural language queries into statistical and visual insights, no code required. Plus, you’ll find a walkthrough on automating customer support with Bedrock and Mistral, and even a guide to running DeepSeek-R1 locally at home (if your GPU can handle it).SponsoredCloudVRM slashes vendor review and audit time by connecting directly to cloud environments, no spreadsheets, no forms, just real-time compliance, 24/7. Watch the demo.Whether you're in research, ops, or product, this editionoffers powerful perspectives and hands-on resources to keep your stack smart and future-ready.Cheers,Merlyn ShelleyGrowth Lead, PacktGet Chapter 1 of Learning Tableau 2025 – Free!Explore Tableau’s newest AI-powered capabilities with a free PDF of Chapter 1 from the latest edition of the bestselling series, Learning Tableau 2025.Written by Tableau Visionary Joshua Milligan, this hands-on guide helps you build smarter dashboards, master data prep, and apply AI-driven insights.Sign up to download your free chapter!Grab Your Free Chapter Now!Top Tools Driving New Research 🔧📊🔳ether0: A 24B LLM Trained with Reinforcement Learning RL for Advanced Chemical Reasoning Tasks. ether0 is a 24B-parameter language model developed by FutureHouse to tackle advanced chemical reasoning tasks. Trained using a blend of reinforcement learning and behavior distillation, it generates molecular structures as SMILES strings and significantly outperforms both general-purpose and chemistry-specific models. ether0 demonstrates exceptional accuracy and data efficiency, achieving 70% accuracy with only 60,000 training reactions, surpassing models trained on full datasets. Its architecture includes novel training strategies like GRPO, curriculum learning, and expert initialization, making it a new benchmark in scientific LLM development for molecular design and synthesis.🔳 OpenGVLab/VeBrain: Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces. Visual Embodied Brain (VeBrain) is a unified framework designed to extend multimodal large language models (MLLMs) into physical environments, enabling them to perceive, reason, and control in real-world spaces. By translating robotic tasks into text-based interactions within a 2D visual context, VeBrain simplifies multimodal objectives. It introduces a robotic adapter to convert MLLM-generated text into actionable control for physical systems. The accompanying VeBrain-600k dataset, meticulously curated with multimodal chain-of-thought reasoning, supports this integration. VeBrain significantly outperforms models like Qwen2.5-VL across multimodal and spatial benchmarks, and demonstrates superior adaptability and compositional reasoning in legged robot and robotic arm control tasks.🔳 Alchemist: Turning Public Text-to-Image Data into Generative Gold. Alchemist introduces a novel strategy for curating high-quality supervised fine-tuning (SFT) datasets to enhance text-to-image generation. By using a pre-trained generative model to identify impactful samples, the authors created a compact, diverse 3,350-sample dataset that significantly boosts the performance of five public T2I models. Unlike existing narrow-domain datasets, Alchemist is general-purpose and openly available, addressing limitations of proprietary data reliance. The approach offers a cost-effective and scalable alternative for dataset creation while improving image quality and stylistic variation in generative outputs. Fine-tuned model weights are also publicly released to support broader research and application.🔳 Meta Introduces LlamaRL: A Scalable PyTorch-Based Reinforcement Learning RL Framework for Efficient LLM Training at Scale. Meta’s LlamaRL is a new PyTorch-based framework designed to make reinforcement learning (RL) more scalable for training large language models. It uses an asynchronous, distributed architecture where components like generation and training run in parallel, reducing GPU idle time and improving memory efficiency. LlamaRL supports massive models, up to 405B parameters, with significant speedups, achieving over 10× faster RL step times compared to traditional methods. Features such as dedicated executors, NVLink-based synchronization, and offloading enable modularity and fine-grained parallelism. LlamaRL offers a flexible, high-performance infrastructure for aligning large models through RL at industrial scale.Topics Catching Fire in Data Circles 🔥💬🔳 Automate Models Training: An MLOps Pipeline with Tekton and Buildpacks. This tutorial introduces an automated MLOps pipeline for training GPT-2 models using Tekton and Buildpacks, without writing a Dockerfile. It demonstrates how to containerize training workflows and orchestrate CI/CD pipelines in Kubernetes. Using Buildpacks, the training code is converted into a secure container image, while Tekton Pipelines manages sequential tasks for building and executing training. A shared PersistentVolume ensures smooth data flow across steps. The pipeline is lightweight, reproducible, and perfect for integrating experimentation into production-grade ML workflows. This example highlights the growing importance of efficient, code-light automation in model development.🔳 Prescriptive Modeling Unpacked: A Complete Guide to Intervention with Bayesian Modeling. This guide explores how prescriptive modeling, using Bayesian methods, enables data-driven intervention in complex systems rather than just prediction. Moving beyond forecasting, it identifies causal drivers in systems and quantifies the effects of changes. With hands-on examples in predictive maintenance and Bayesian networks via the bnlearn Python library, the article walks through building causal models, inferring interventions, and applying them to real-world scenarios like water infrastructure. It also covers structure learning, synthetic data generation, and practical cost-benefit considerations, making it a comprehensive resource for actionable analytics in operations and engineering.🔳 How OpenAI responding to The New York Times’ data demands in order to protect user privacy? OpenAI is actively resisting a legal demand from The New York Times to indefinitely retain ChatGPT and API user data, a move it argues undermines its privacy commitments. The order excludes Enterprise and Zero Data Retention API users. OpenAI is appealing the decision, maintaining data will remain securely stored, restricted to legal teams, and used only to meet legal obligations. Deleted chats, normally erased within 30 days, are affected by the hold, but OpenAI vows to fight further access requests and uphold user privacy throughout the legal process. Training policies and business data protections remain unchanged.🔳 What execs want to know about multi-agentic systems with AI? This field report highlights key lessons from enterprise adoption of Multi-Agent Systems (MAS). While MAS can transform complex processes through coordinated AI agents, many leaders limit its value by simply automating legacy workflows. Success requires reimagining processes, designing thoughtful agent collaboration, and embedding governance and ethics from the start. Common missteps include neglecting collaboration logic, delaying ethical safeguards, and underestimating the shift needed to harness MAS fully. Executives most often ask how to measure ROI beyond cost, how to balance human and AI roles, and how to manage ethical risks. Effective MAS design relies on clear goals, rigorous testing, and human-AI orchestration.New Case Studies from the Tech Titans 🚀💡🔳 10,000x Faster Bayesian Inference: Multi-GPU SVI vs. Traditional MCMC. Bayesian inference has traditionally been limited by high computational demands, especially in large-scale applications. This guide demonstrates how Stochastic Variational Inference (SVI) on multi-GPU setups can dramatically accelerate Bayesian modeling, achieving up to a 10,000x speedup over traditional CPU-based MCMC. Using JAX and NumPyro, data is efficiently sharded and replicated across GPUs, enabling scalable inference for millions of observations and parameters. Benchmarks show multi-GPU SVI reduces training time from days to minutes, making large hierarchical Bayesian models feasible for production. This approach is ideal for practitioners seeking rapid, scalable, and approximate Bayesian solutions in real-world settings.🔳 BenchmarkQED: Automated benchmarking of RAG systems:BenchmarkQED is an automated benchmarking suite designed to rigorously evaluate retrieval-augmented generation (RAG) systems. Developed to support tools like GraphRAG, it includes components for query generation (AutoQ), evaluation (AutoE), and dataset structuring (AutoD). BenchmarkQED enables consistent testing across local-to-global query types, using synthetic queries and LLM-based judgments. LazyGraphRAG, evaluated with this suite, consistently outperforms traditional and advanced RAG methods, even those with massive 1M-token contexts, across comprehensiveness, diversity, empowerment, and relevance. BenchmarkQED and its datasets, now open-source, offer a scalable, structured path for testing next-gen RAG capabilities in real-world QA applications.🔳 OpenAI on Countering Malicious AI – June 2025 OpenAI’s June 2025 report highlights how its teams are actively detecting and disrupting malicious uses of AI. In line with its mission to ensure AI benefits humanity, the company outlines efforts to block harmful applications such as cyber espionage, social engineering, scams, and influence operations. By leveraging AI to augment internal investigative teams, OpenAI has rapidly identified and neutralized threats over the past three months. The report reinforces the importance of democratic AI governance and common-sense safeguards to prevent misuse by authoritarian regimes and bad actors while supporting global safety and accountability.🔳 Deploying Llama4 and DeepSeek on AI Hypercomputer: Google has released new optimized recipes for deploying Meta’s Llama4 and DeepSeek models using its AI Hypercomputer platform. These guides streamline the setup of powerful MoE-based LLMs like Llama-4-Scout and DeepSeek-R1 across Trillium TPUs and A3 GPUs. Using inference engines like JetStream, MaxText, vLLM, and SGLang, developers can now efficiently run large models with multi-host support, minimal configuration, and reproducible performance. Recipes cover tasks such as model checkpoint conversion, TPU/GPU provisioning, and benchmarking (e.g., MMLU), enabling scalable, high-throughput inference for cutting-edge open-source LLMs in production-grade environments.🔳 New MCP integrations to Google Cloud Databases: Google Cloud has announced new MCP Toolbox integrations for databases, designed to supercharge AI-assisted development. The open-source Model Context Protocol (MCP) server now supports seamless connections between AI coding assistants (like Claude Code, Cline, and Cursor) and databases such as BigQuery, AlloyDB, Cloud SQL, Spanner, and others. These new capabilities enable developers to perform tasks like schema design, data exploration, code refactoring, and integration testing using natural language prompts within their IDEs. The result: faster, smarter development workflows, with AI handling the SQL and schema logic, dramatically reducing setup and iteration time.Blog Pulse: What’s Moving Minds 🧠✨🔳 Mastering SQL Window Functions: Mastering SQL Window Functions offers a clear and practical introduction to using window functions for powerful row-wise analysis without collapsing data. Unlike traditional aggregations, window functions (like SUM() OVER or RANK() OVER) preserve individual records while enabling calculations across partitions. Examples include calculating totals per brand, ranking by price, and computing year-wise averages, all while retaining full row-level detail. These functions are essential for tasks like ranking, comparisons, and cumulative metrics, making them a vital tool in modern analytics workflows. However, they may incur performance costs on large datasets, so use them judiciously.🔳 Automate customer support with Amazon Bedrock, LangGraph, and Mistral models: This walkthrough demonstrates how to build an intelligent, multimodal customer support workflow using Amazon Bedrock, LangGraph, and Mistral models. By combining large language models with structured orchestration and image-processing capabilities, the solution automates tasks such as ticket categorization, transaction and order extraction, damage assessment, and personalized response generation. LangGraph enables complex, stateful agent workflows while Amazon Bedrock provides secure, scalable access to LLMs and Guardrails for responsible AI. With integrations for Jira, SQLite, and vision models like Pixtral, this framework delivers real-time, context-aware support automation with observability and safety built in.🔳 Run the Full DeepSeek-R1-0528 Model Locally: DeepSeek-R1-0528, a powerful reasoning model requiring 715GB of disk space, is now runnable locally thanks to Unsloth's 1.78-bit quantization, reducing its size to 162GB. This guide explains how to deploy the quantized version using Ollama and Open WebUI. With at least 64GB RAM (CPU) or a 24GB GPU (for better speed), users can serve the model via ollama run, launch Open WebUI in Docker, and interact with the model through a local browser. While GPU usage offers ~5 tokens/sec, CPU-only fallback is much slower (~1 token/sec). Setup is demanding, but viable with persistence.🔳 How to Build an Asynchronous AI Agent Network Using Gemini for Research, Analysis, and Validation Tasks? The Gemini Agent Network Protocol offers a modular framework for building cooperative AI agents, Analyzer, Researcher, Synthesizer, and Validator, using Google’s Gemini models. This tutorial walks through creating asynchronous workflows where each agent performs role-specific tasks such as breaking down complex queries, gathering data, synthesizing information, and verifying results. By using Python's asyncio for concurrency and google.generativeai for model interaction, the network dynamically routes tasks and messages. With detailed role prompts and shared memory for dialogue context, it allows for efficient multi-agent collaboration. Users can simulate scenarios such as analyzing quantum computing’s impact on cybersecurity and observe real-time agent participation metrics.🔳 Build a Gemini-Powered DataFrame Agent for Natural Language Data Analysis with Pandas and LangChain: This tutorial demonstrates how to combine Google’s Gemini models with Pandas and LangChain to create an intelligent, natural-language-driven data analysis agent. Using the Titanic dataset as a case study, the setup allows users to query the data conversationally, eliminating the need for repetitive boilerplate code. The Gemini-Pandas agent can answer simple questions such as dataset size, compute survival rates, or identify correlations. It can also handle advanced analyses like age-fare correlation, survival segmentation, and multi-DataFrame comparisons. Custom analyses, such as building passenger risk scores or evaluating deck-wise survival trends, are also supported. With just a few lines of Python and LangChain tooling, analysts can turn datasets into a conversational playground for insight discovery.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

1
0

DataPro

Merlyn from Packt

24 Sep 2025

2 min read

Packt Live: Algo Trading Workshop With Jason Strimpel

Merlyn from Packt

24 Sep 2025

2 min read

Here’s why this live session on algo trading could be the perfect add-on to your data toolkit.📢 A Packt Live Session You Shouldn’t MissWe wanted to share an upcoming Packt Live workshop that we believe will resonate with many of you in the DataPro community.On September 27, Jason Strimpel,author of Python for Algorithmic Trading Cookbook and founder of PyQuant News, is hosting a 2.5-hour hands-on workshop on Algorithmic Trading with Python.Why should you care as a data professional? Because algo trading is a natural extension of your skills. You already work with data to generate insights. This session shows you how those same skills can be applied to the financial markets: turning data into signals, testing strategies safely, and deploying systems that run live.Here’s what you’ll explore live with Jason:✅ Backtesting strategies with VectorBT✅ Prototyping and validating in pandas✅ Deploying trading systems via the Interactive Brokers API✅ Managing execution risks like slippageLEARN WITH JASON LIVEWhen you register, you’ll instantly unlock:📘 Python for Algorithmic Trading Cookbook eBook🛠️ Two bonus setup guides to install Python libraries with ease💬 Private Discord access to post queries, get direct answers from Jason, and join peer-learningAnd after the workshop, you’ll receive:🎥 90-day replay access to revisit the full session📜 A certificate of completion to showcase your achievementIn today’s AI-driven job market, adding algo trading to your toolkit isn’t just about finance. It’s about broadening your ability to apply data in real-world, high-impact domains.⚡ Seats are limited, consider this your heads-up to secure a spot.BUILD TRADING SYSTEMS LIVE WITH JASON📅 Date: September 27, 2025⏰ Duration: 2.5 hours (Workshop + Q&A)💻 Format: Live & Online + Private DiscordSee you at the workshop!Cheers,Merlyn Shelley,Growth Lead @Packt.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
3

DataPro

Merlyn from Packt

04 Nov 2025

11 min read

What’s powering AI’s next leap? LongCat Flash Omni, DeepAgent, SkyRL & more

Merlyn from Packt

04 Nov 2025

11 min read

From multimodal LLMs to self-thinking agents, see what’s driving AI’s next leap.👋 Hello ,Welcome to DataPro #155 ➖Where Models Get Smarter, Agents Get Autonomous, and AI Gets Real-Time.This week’s edition explores the frontier of intelligent systems that see, reason, and act. From Meituan’s LongCat Flash Omni and DeepAgent’s unified reasoning to OpenAI’s gpt-oss-safeguard and SkyRL tx, AI is rapidly evolving toward autonomy, speed, and safety. We also look at how multimodal RAG, ethical AI, and data mesh are redefining how we build and scale intelligence.Knowledge Partner Spotlight: OutskillAt Packt, we’ve partnered with Outskill to help readers gain practical exposure to AI tools through free workshops, complementing the deeper, hands-on, expert-led experiences offered through Packt Virtual Conferences.If you're interested in enhancing your AI skills, Outskill’s LIVE 2-Day AI Mastermind offers a 16-hour training on AI tools, automations, and agent-building. This weekend’s sessions (Saturday and Sunday, 10 AM–7 PM EST) are available at no cost as part of their Black Friday Sale, providing a great opportunity to elevate your knowledge in just two days.Learn AI tools, agents & automations in just 16 hoursJoin now, limited free seats available!This week’s highlights:🔸LongCat Flash Omni:Meituan’s open 560B multimodal model for real-time interaction🔸DeepAgent: A unified reasoning agent that thinks, searches, and acts autonomously🔸SkyRL tx v0.1.0: Tinker-style reinforcement learning engine for local clusters🔸OpenAI gpt-oss-safeguard: Policy-conditioned safety reasoning models, open-weight and Apache 2.0🔸Does AI Need to Be Conscious to Care? Exploring the philosophy of artificial moral concern🔸Building Multimodal RAG: How to make retrieval truly visual and contextual🔸Covestro x Amazon DataZone: A blueprint for scaling data governance through data meshEach story in this issue unpacks a new layer in how AI learns, governs, and grows—so grab a coffee, settle in, and let’s dive into the full roundup.Cheers,Merlyn ShelleyGrowth Lead, PacktSponsored:🔸82% of data breaches happen in the cloud. Join Rubrik’s Cloud Resilience Summit to learn how to recover faster and keep your business running strong. [Save Your Spot]🔸Build your next app on HubSpot’s all-new Developer Platform,the flexible, AI-ready foundation to create, extend, and scale your integrations with confidence. [Start Building Today]Subscribe|Submit a tip|Advertise with UsTop Tools Driving New Research 🔧📊🔶 LongCat-Flash-Omni: A SOTA Open-Source Omni-Modal Model with 560B Parameters with 27B activated, Excelling at Real-Time Audio-Visual Interaction. Meituan’s LongCat Flash Omni is a 560B-parameter open-source multimodal model that activates 27B per token using shortcut-connected MoE. It extends text LLMs to vision, video, and audio with 128K context and real-time streaming through 1-second audio-visual interleaving at 2 fps duration-conditioned sampling. With modality-decoupled parallelism, it retains 90% text-only throughput and scores 61.4 on OmniBench, 78.2 on VideoMME, and 88.7 on VoiceBench, nearing GPT-4o performance.🔶 DeepAgent: A Deep Reasoning AI Agent that Performs Autonomous Thinking, Tool Discovery, and Action Execution within a Single Reasoning Process. Most agent frameworks still follow a fixed Reason–Act–Observe loop, but DeepAgent from Renmin University and Xiaohongshu redefines this with end-to-end deep reasoning. Built on a 32B QwQ backbone, it unifies thought, tool search, tool call, and memory folding within one stream. It dynamically retrieves tools from 16K+ APIs, compresses long histories into structured memories, and trains via Tool Policy Optimization (ToolPO) for precise tool use. DeepAgent achieves 69.0 on ToolBench and 91.8% success on ALFWorld, outperforming ReAct-style workflows in both labeled and open tool settings.🔶 Anyscale and NovaSky Team Releases SkyRL tx v0.1.0: Bringing Tinker Compatible Reinforcement Learning RL Engine To Local GPU Clusters. Anyscale and UC Berkeley’s NovaSky team released SkyRL tx v0.1.0, a local, Tinker-compatible engine that unifies training and inference for LLM reinforcement learning. It implements Tinker’s low-level API (forward_backward, optim_step, sample, save_state) and runs on user infrastructure. The update adds end-to-end RL, jitted sharded sampling, LoRA adapter support, gradient checkpointing, micro batching, and Postgres integration, enabling full RL training on 8×H100 GPUs with Tinker-level efficiency and open deployment.🔶 OpenAI Releases Research Preview of 'gpt-oss-safeguard': Two Open-Weight Reasoning Models for Safety Classification Tasks. OpenAI released gpt-oss-safeguard, two open-weight safety reasoning models, 120B and 20B parameters, that let developers enforce custom safety policies at inference time. Fine-tuned from gpt-oss and Apache 2.0 licensed, they replicate OpenAI’s internal Safety Reasoner used in GPT-5 and Sora 2. The models reason step by step on developer-supplied policies, outperform gpt-5-thinking on multi-policy accuracy, and fit on single-GPU setups for real moderation pipelines.Topics Catching Fire in Data Circles 🔥💬🔶 Does AI Need to Be Conscious to Care? This philosophical study explores that question through a precise framework. It distinguishes functional, experiential, and moral caring, showing that caring behaviors can exist without consciousness, as seen in bacteria, plants, and immune systems. While current AI systems display goal-directed, welfare-promoting behavior, they lack genuine concern. Consciousness-based and agency-based routes could both lead to artificial moral concern, suggesting caring exists on a spectrum. Future AI may combine conscious experience with robust agency, raising urgent ethical questions about artificial moral significance.🔶 Building a Multimodal RAG That Responds with Text, Images, and Tables from Sources. Retrieval-Augmented Generation (RAG) has long powered text-based chatbots, but extending it to images, tables, and graphs is far harder. Real documents, like research papers and corporate reports, mix text, formulas, and figures without consistent formatting, breaking the link between visuals and context. To fix this, a new multimodal RAG pipeline introduces context-aware image summaries using nearby text instead of isolated captions, and text-response-guided image selection, where visuals are chosen after the textual answer is generated. Together, these steps yield consistent, contextually grounded multimodal retrieval across complex documents.🔶 From Classical Models to AI: Forecasting Humidity for Energy and Water Efficiency in Data Centers. This blog explores how accurate humidity forecasting can improve the efficiency, reliability, and sustainability of AI data centers. It explains how temperature and humidity directly affect cooling systems, energy use, and water consumption, and presents a real-world case study using Delhi’s climate data. The post compares forecasting methods, AutoARIMA, Prophet, XGBoost, and deep learning, with prediction intervals to assess accuracy and uncertainty, aiming to identify the best tools for operational planning and environmental optimization in large-scale AI infrastructure.🔶 Scaling data governance with Amazon DataZone: Covestro success story. This blog explores how Covestro Deutschland AG reengineered its global data architecture by transitioning from a centralized data lake to a domain-driven data mesh using Amazon DataZone and the AWS Serverless Data Lake Framework (SDLF). The transformation empowered teams to manage data products independently while maintaining consistent governance, improving data sharing and visibility. Through AWS Glue, S3, and automated data quality checks, Covestro now operates over 1,000 standardized data pipelines, achieving faster delivery, stronger governance, and scalable analytics across the enterprise.New Case Studies from the Tech Titans 🚀💡🔶 How to design conversational AI agents? This blog explores how conversational AI is transforming the online shopping experience by replacing rigid keyword-based search with natural, intuitive interactions. It outlines seven key design principles for creating AI shopping agents that understand user intent, personalize recommendations, support multimodal input, and present rich visuals. The post also highlights best practices for building user trust, handling ambiguity gracefully, and leveraging Google Cloud’s Conversational Commerce tools and Figma’s component library to design adaptable, on-brand, and intelligent shopping experiences.🔶 How 5 agencies created an impossible ad with Gemini 2.5 Pro? Generative AI is rewriting the rules of creativity. With Gemini 2.5 Pro and Google’s suite of generative media models, Imagen, Veo, Lyria, and Chirp, brands are moving beyond traditional campaigns to design what was once impossible. From Slice’s AI-powered retro radio station and Virgin Voyages’ personalized “postcards from your future self,” to Smirnoff’s interactive party co-host and Moncler’s cinematic AI film, these projects show how imagination and technology now merge to create entirely new forms of storytelling and brand expression.🔶 Build intelligent ETL pipelines using AWS Model Context Protocol and Amazon Q: Building and maintaining ETL pipelines has long been one of the most time-consuming parts of data engineering. With conversational AI and Model Context Protocol (MCP) servers, teams can now automate much of that process, turning complex scripting into guided, natural language interactions. By integrating with AWS services like Redshift, S3 Tables, and Glue, organizations can generate, test, and deploy pipelines faster while preserving security and governance standards. This post demonstrates how data scientists and engineers can use conversational AI to extract data, validate quality, and automate end-to-end migrations from Redshift to S3, reducing manual effort, improving accuracy, and accelerating insight generation.🔶 Amazon Kinesis Data Streams launches On-demand Advantage for instant throughput increases and streaming at scale: Managing real-time data streams just became simpler and more cost-efficient with the launch of Amazon Kinesis Data Streams On-demand Advantage mode. This new capability introduces warm throughput for instant scalability during traffic spikes and a committed-usage pricing model that significantly lowers costs for steady, high-volume workloads. Designed for use cases ingesting at least 10 MiB/s or operating hundreds of streams per region, it eliminates the need to manually switch between capacity modes. The post explains how On-demand Advantage helps organizations handle predictable surges, optimize costs, and configure warm throughput up to 10 GiB/s, along with setup steps, pricing details, and best practices for maintaining high-performance streaming pipelines.Blog Pulse: What’s Moving Minds 🧠✨🔶 The Pearson Correlation Coefficient, Explained Simply: Understanding how variables move together is the foundation of predictive modeling. In this walkthrough, we explore how to calculate and interpret the Pearson correlation coefficient, a key step before fitting a regression model. Using a simple salary dataset with Years of Experience and Salary, the post explains how to visualize relationships with scatter plots, compute variance, covariance, and standard deviation, and finally derive the correlation coefficient. With a result of r = 0.9265, the example shows a strong positive linear relationship, confirming that simple linear regression is well suited for predicting salary based on experience.🔶 Graph RAG vs SQL RAG: Comparing how large language models reason over structured and connected data reveals valuable insights into retrieval-augmented systems. In this experiment, a Formula 1 results dataset was stored in both a SQL and a graph database, then queried using retrieval-augmented generation (RAG) with models like GPT-3.5, GPT-4, and GPT-5. Each model translated natural language into SQL or graph queries to answer questions about drivers, races, and championships. The results show that newer models like GPT-5 achieved near-perfect accuracy across both databases, while simpler models struggled more with graph data. The study concludes that RAG-equipped LLMs can reason reliably over either database type, letting teams choose whichever structure best fits their data without sacrificing performance.🔶 RF-DETR Under the Hood: The Insights of a Real-Time Transformer Detection. Object detection has come a long way from rigid anchor grids to adaptive Transformer architectures. RF-DETR, Roboflow’s latest real-time detection model, embodies that evolution. Building on DETR’s end-to-end design, Deformable DETR’s adaptive attention, and LW-DETR’s lightweight efficiency, RF-DETR fuses these innovations with a DINOv2 self-supervised backbone for domain adaptability and speed. The result is a model that achieves real-time performance without sacrificing accuracy, capable of both bounding box detection and segmentation. In essence, RF-DETR showcases how adaptive attention and self-supervised vision have made Transformers fast, flexible, and production-ready for modern computer vision tasks.🔶 Building secure Amazon ElastiCache for Valkey deployments with Terraform. Managing infrastructure through code is becoming essential for secure, scalable cloud deployments. Using Infrastructure as Code (IaC) with Terraform, this guide walks through building a secure Amazon ElastiCache for Valkey cluster, covering both serverless and node-based options. It demonstrates how IaC ensures consistent configurations for encryption, authentication, and network isolation across environments. The walkthrough details step-by-step deployment, from provisioning private subnets and KMS-encrypted storage to implementing token-based authentication and CloudWatch logging. The result is a reproducible, production-grade ElastiCache setup that combines automation, security, and cost efficiency through a modern Terraform workflow.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0

DataPro

Merlyn from Packt

18 Sep 2025

9 min read

Meta AI’s MapAnything, Google’s Data Science Agent, Agent Payments Protocol (AP2), Hugging Face Trackio, IBM’s Granite-Docling

Merlyn from Packt

18 Sep 2025

9 min read

Implement Zarr for Large-Scale Data, Unified Intent Recognition EngineYour Exclusive Invite for the World’s first 2-day AI Challenge (usually $895, but $0 today)51% of companies have started using AITech giants have cut over 53,000 jobs in 2025 itselfAnd 40% of professionals fear that AI will take away their job.But here’s the real picture — companies aren't simply eliminating roles, they're hiring people who are AI-skilled, understand AI, can use AI & even build with AI.Join the online 2-Day LIVE AI Mastermind by Outskill - a hands-on bootcamp designed to make you an AI-powered professional in just 16 hours.Usually $895, but for the next 48 hours you can get in for completely FREE!In just 16 hours & 5 sessions, you will:✅ Learn the basics of LLMs and how they work.✅ Master prompt engineering for precise AI outputs.✅ Build custom GPT bots and AI agents that save you 20+ hours weekly.✅ Create high-quality images and videos for content, marketing, and branding.✅ Automate tasks and turn your AI skills into a profitable career or business.🧠Live sessions- Saturday and Sunday🕜10 AM EST to 7PM ESTAll by global experts from companies like Amazon, Microsoft, SamurAI and more. And it’s ALL. FOR. FREE. 🤯 🚀🎁 You will also unlock $5100+ in AI bonuses: 💬 Slack community access, 🧰 top AI tools, and ⚙️ ready-to-use workflows — all free when you attend!Join in now, (we have limited free seats!)SponsoredSubscribe|Submit a tip|Advertise with UsWelcome toDataPro 150: Your Weekly Brief on Data & AI 🚀The pace of change in data and AIisn’tslowing down, and this week brings some of the most practical and forward-looking updates yet. From universal models and secure agent-led payments to tutorials you can run inColab, DataPro 150 is packed with the stories, tools, and insights that will shape your workflows.Here are the highlights worth your time:Buildanend-to-end voice AI agentwith Hugging Face pipelines onColab, combining Whisper, FLAN-T5, and Bark for real-time conversations.ExploreMeta AI’sMapAnything, a transformer-based universal model for 3D reconstruction across 12 tasks, fully open-sourced.Learn whyyourA/B test “winner”might be random noise, and how to design experiments withreal statisticalrigor.See howGoogle’sData Science Agentnow integratesBigQueryML,DataFrames, and Spark to accelerate analytics with natural prompts.Discover theAgent Payments Protocol (AP2), Google’s open standard for secure agent-led transactions backed by 60+ partners.TryHugging FaceTrackio, a lightweightColab-native dashboard for experiment tracking and hyperparameter sweeps.Plenty more awaits inside, from deep dives on retail sales shift analysis andFirestore’snew MCP tools, to hands-on coding with Zarr, advanced neural agents, and interpretable DNA CNNs. IBM’s Granite-Doclingalso makes a splash in document AI, while gradient boosted trees and unified intent recognition get the visual and structural treatment they deserve.Together, these stories capture where AI is heading, smarter agents, more robust evaluation, and unified frameworks that bridge research and enterprise.Let’sdive in. 🌊Cheers,Merlyn ShelleyGrowth Lead, PacktAs a data professional, you already know how to find insights in complex information. But often, those insights stay in reports instead of powering real decisions.That is where algorithmic trading comes in. It is the perfect add-on skill, taking what you already do with data and applying it to the financial markets.On September 27, join Jason Strimpel, author of Python for Algorithmic Trading Cookbook, for a 2.5-hour live workshop where you will:✅ Prototype and validate strategies with pandas✅ Backtest the right way using VectorBT✅ Deploy live systems with the Interactive Brokers API💡 Plus, you will get: a free eBook, 90-day replay access, and a participation certificate.LEARN WITH JASON LIVETop Tools Driving New Research 🔧📊⬛How to Build an Advanced End-to-End Voice AI Agent Using Hugging Face Pipelines?This tutorial explains how to build an advanced voice AI agent using Hugging Face pipelines on GoogleColab. It combines Whisper for speech recognition, FLAN-T5 for reasoning, and Bark for speech synthesis, avoiding APIs or heavy dependencies. The guide covers transcription, response generation, speech synthesis, conversation management, and aGradioUI for real-time interactive voice conversations.⬛Meta AI Researchers Release MapAnything:MapAnythingis a transformer-based universal model for 3D reconstruction that supports over 12 tasks such as monocular depth, multi-view stereo, and structure from motion in a single feed-forward system. Built on DINOv2 features with a factored scene representation, it processes up to 2,000 images with optional priors, achievesstate-of-the-artresults, and isopen-sourcedunder Apache 2.0 with complete training resources. This blog explores its architecture, training strategy, benchmarks, and key contributions.⬛Why Your A/B Test Winner Might Just Be Random Noise?An 8% boost in sprint speed sounds like a breakthrough, but it might just be chance. This post explores how randomness can mislead us in A/B tests, illustrated through a football team’s warm-up experiment. By unpacking the pitfalls of small samples and uncontrolled factors, it shows how proper design, replication, and statistical rigor separate real signal from noise.⬛Data Science Agent now supports BigQuery ML, DataFrames, and Spark:Google is bringing an AI-firstColabEnterprise notebook experience to Vertex AI, designed to simplify and accelerate data science workflows. This blog explores how theData Science Agentnow supportsBigQueryML,BigQueryDataFrames, and Spark generation from prompts, adds context-aware data retrieval and @ mentions, and enables seamless automation of data exploration, transformation, and modeling at scale.Topics Catching Fire in Data Circles 🔥💬⬛Announcing Agent Payments Protocol (AP2):Google introduces the Agent Payments Protocol (AP2), an open standard for secure agent-led transactions that extends A2A and MCP. AP2 uses cryptographically signed Mandates and verifiable credentials to prove intent, authorize carts, and create an auditable trail. It supports cards, bank transfers, and stablecoins, is backed by 60+ partners, enables new commerce flows, and ships with public specs and reference implementations.⬛A Comprehensive Coding Guide to Building Interactive Experiment Dashboards with Hugging Face Trackio:This tutorial walks through Hugging FaceTrackiofor clean, local experiment tracking in a singleColabnotebook. You installTrackio, build a synthetic dataset, run multiple SGD training configs, and log metrics and confusion-matrix tables. A small hyperparameter sweep summarizes best settings, results import from CSV, and the lightweight dashboard updates in real time, giving intuitive visibility into runs and performance.⬛Analysis of Sales Shift in Retail with Causal Impact:Estimating how sales shift when a product disappears from shelves is a complex but crucial task for retailers. This article explores Carrefour’s use of Google’sCausal Impactmethod, whichleveragesBayesian structural time-series models to build synthetic controls. It explains the use case, strategies for handling anomalies, covariate selection, model design, and validation to produce reliable estimates of lost and transferred sales.⬛Firestore support and custom tools in MCP Toolbox:MCP Toolbox for Databases is an open-source server that connects AI agents to enterprise data, with support forBigQuery,AlloyDB, Cloud SQL, and Spanner. This article introduces newFirestoretools that bring AI-assisted development to the NoSQL world. From querying documents and cleaning data tovalidatingsecurity rules, developers can now manageFirestoredirectly through natural language in environments like Gemini CLI.New Case Studies from the Tech Titans 🚀💡⬛A Coding Guide to Implement Zarr for Large-Scale Data:This tutorial exploresZarr, a library for efficient storage and manipulation of large multidimensional arrays. Starting with array creation, chunking, and on-disk edits, it moves into advanced operations like compression benchmarks, hierarchical dataset structures, time-series simulations, and volumetric indexing. You also learn chunk-aware processing and data visualization, gaining hands-on experience with Zarr’s performance, scalability, and flexibility for real-world scientific workflows.⬛How to Build a Robust Advanced Neural AI Agent with Stable Training, Adaptive Learning, and Intelligent Decision-Making?This tutorialdemonstrateshow to design and implement anAdvanced Neural Agentthat blends classical neural network methods with modern stability techniques. It covers Xavier initialization, stable activations, gradient clipping, momentum updates, and weight decay. The training loop integrates mini-batching, adaptive learning rates, early stopping, and instability resets. Extended with experience replay and exploratory decisions, the agent adapts to regression, classification-to-regression, and RL-style tasks.⬛ROC AUC Explained: A Beginner’s Guide to Evaluating Classification Models.Evaluating binary classification on imbalanced datasets requires morethan accuracyalone. In the IBM HR Analytics case, logistic regression reached 86%accuracy, yetrecall for employees who left was just 34%. This gap highlights why ROC AUC is essential. By analyzing true positive and false positive rates across all thresholds, it provides a balanced, threshold-independent measure of model quality.⬛Automate app deployment and security analysis with new Gemini CLI extensions:Close the gap between terminal and cloud with Gemini CLI’s new extensions. The security extension adds/security:analyzefor local vulnerability scans with actionable fixes and upcoming GitHub PR reviews. The Cloud Run extension adds/deployto build and ship apps to a public URL in minutes via an MCP-backed pipeline. Install the extensions, authenticate withgcloud, and deploy or scan from one place. This blog is about simplifying secure development and deployment workflows directly from Gemini CLI.Blog Pulse: What’s Moving Minds 🧠✨⬛IBM AI Releases Granite-Docling-258M:IBM has introduced Granite-Docling-258M, an open-source vision-language model built for end-to-end document conversion. It improves overSmolDoclingwith a Granite 165M backbone, SigLIP2 vision encoder, and stability fixes, achieving higher accuracy in layout, OCR, tables, code, and equations. EmittingDocTagsfor structured output, it supports multilingual text, integrates withDoclingpipelines, and runs efficiently across runtimes. This blog is about advancing enterprise-ready, structure-preserving document AI.⬛Building an Advanced Convolutional Neural Network with Attention for DNA Sequence Classification and Interpretability:An advanced convolutional neural network can be built to classify DNA sequences by combining one-hot encoding, multi-scale convolutional layers, and attention for interpretability. This tutorial walks through generating synthetic data, training with callbacks, and visualizing results across promoter prediction, splice site detection, and regulatory element tasks. The workflowdemonstrateshow deep learning can capture biological motifs while offering transparency. This blog is about applying CNNs with attention to DNA sequence classification in a reproducible, interpretable way.⬛Building a Unified Intent Recognition Engine:Intent recognition often sits in silos across enterprise teams, each building bespoke pipelines for chatbots, triage tools, or assistants. A unified approach simplifies this by standardizing reusable steps, preprocessing, embeddings, vector search, and scoring, while allowing project-specific customization. The Unified Intent Recognition Engine (UIRE) accelerates deployment, reduces redundancy, and supports advanced features like multi-intent detection and out-of-scope handling. This blog is about creating a modular, scalable framework for enterprise-wide intent recognition.⬛A Visual Guide to Tuning Gradient Boosted Trees:Gradient boosted trees extend decision trees and random forests by building trees sequentially, each correcting the errors of thepreviousones. Using scikit-learn for visualization, we can see predictions refine over iterations, errors shrink, and performance shift with hyperparameters like learning rate, depth, and estimators. This exploration highlights their strengths, trade-offs, and practical behavior in real-world applications.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0

DataPro

Merlyn from Packt

11 Sep 2025

8 min read

GibsonAI Memori: SQL-Native Memory for Agents, NVIDIA’s Universal Deep Research, Conversational Commerce Agent on Vertex AI

Merlyn from Packt

11 Sep 2025

8 min read

Free eBook: Debugging Apache Airflow® DAGsFree eBook: Fix your Airflow DAG errors fasterEven the most advanced Airflow users encounter DAG errors and task failures. That’s why we wrote Debugging Apache Airflow® DAGs. It’s a guide written by practitioners, for practitioners covering everything you need to know to solve issues with your DAGs:✅ Identifying issues during development✅ Using tools that make debugging more efficient✅ Conducting root cause analysis for complex pipelines in productionGET YOUR FREE GUIDE NOWSponsoredSubscribe|Submit a tip|Advertise with UsWelcome to DataPro 149- yourgo-to newsletter for all things Data and AI.This edition is packed with breakthroughs, experiments, and tutorials that show how fast the AI + data stack is evolving. From SQL-native memory engines to federated AI registries, adaptive defenses in federated learning, and even a 1950s algorithm powering computer vision, the highlights are designed to spark both curiosity and practical takeaways.Here’swhatyou’lldiscover 👇🔹MCP Registry Preview: DNS for AI Context-Meet the federated system for discovering AI servers, designed to scale like the internet itself.🔹Is Your Training Data Representative? PSI & Cramér’s V in Python- Learn how to measure representativeness, automate comparisons, and catch dataset drift before it breaks your models.🔹Fighting Back Against Attacks in Federated Learning-See how poisoning attacks work, why existing defenses fall short, and how adaptive strategies like EE-Trimmed Mean change the game.🔹Top 7 MCP Servers for Vibe Coding- From Git integration to browser automation and memory layers, these servers unlock context-rich collaboration between developers and AI agents.🔹NVIDIA’s Universal Deep Research (UDR)-A prototype framework that separates research strategy from the LLM itself, making deep research scalable, auditable, and customizable.🔹GibsonAIMemori: SQL-Native Memory for Agents-Forget costly vector DBs: this open-source memory engine makes agent memory transparent, portable, and cheap to run.Each story blendscutting-edgeideas with hands-on value,perfect for anyone building smarter AI systems, securing their pipelines, or just keeping ahead of the curve.So, without further ado, let’s jump in.Cheers,Merlyn ShelleyGrowth Lead, PacktTop Tools Driving New Research 🔧📊🔸MCP Team Launches the Preview Version of the 'MCP Registry': A Federated Discovery Layer for Enterprise AI.This blog unpacks the MCP Registry, a new open-source system designed as “DNS for AI context.” It explains why the federated model beats a single registry, how it secures enterprise AI, and what makes it scalable.You’llalso find details on its architecture, governance, and open-source foundation, plus practical FAQs for getting started with the preview release.🔸Building Advanced MCP (Model Context Protocol) Agents with Multi-Agent Coordination, Context Awareness, and Gemini Integration.Advanced MCP Agents can now be built and run insideJupyterorColabwith practical features like multi-agent coordination, context awareness, and Gemini integration. This tutorial shows how role-based agents such as researchers, analyzers, and executors work together as a swarm,maintainmemory for continuity, and deliver coherent results for complex, real-world AI tasks.🔸Is Your Training Data Representative? A Guide to Checking with PSI in Python:Checkingif your training data trulyrepresentsreality matters at build, deploy, andmonitorstages. This guide shows how to compare samples with PSI and Cramér’s V, from visual checks to robust stats, then automates the workflow in Python and exports an Excel report.You’llsee a worked example on Communities & Crime and clear thresholds for action.🔸Fighting Back Against Attacks in Federated Learning:Federated learning promises privacy-preserving training, but it also opens the door to subtle attacks like data poisoning and model manipulation. In this project, a multi-node simulator built onFEDnexplores how such attacks work, how currentdefenceshold up, and why adaptive strategies like EE-Trimmed Mean are needed. Experiments reveal lessons for making FL more resilient and trustworthy.Topics Catching Fire in Data Circles 🔥💬🔸Top 7 Model Context Protocol (MCP) Servers for Vibe Coding:Model Context Protocol servers areemergingas the backbone of Vibe Coding, where developers and AI agents collaborate in real time. This guide highlights seven standout MCP servers,from Git integration and live database access to browser automation, persistent memory, multi-agent orchestration, and research support,that make coding more adaptive, reproducible, and context-rich for modern development workflows.🔸How to Build a Complete End-to-End NLP Pipeline with Gensim: Topic Modeling, Word Embeddings, Semantic Search, and Advanced Text Analysis.An end-to-end NLP pipeline can be built inGensimthat covers preprocessing, topic modeling, embeddings, similarity search, and advanced analysis. This tutorial shows how to run it all inColab, from Word2Vec training and LDA topic modeling to coherence evaluation, visualization, and document classification. The result is a reusable framework for exploring and interpreting text data at scale.🔸Understanding the BigQuery column metadata (CMETA) index:BigQueryis pushing beyond petabyte-scale warehouses to petabyte-scale tables, where even metadata becomes big data. To keep queries fast and efficient, Google introduced the Column Metadata (CMETA) index, an automated, zero-maintenance system that prunes blocks early, saving time and slots. This blog explains how CMETA works, its impact on performance, and how to maximize its benefits.🔸When A Difference Actually Makes A Difference:A five-point gap on a bar chart can meanvery differentthings depending on variance, sample size, and effect size. In this bite-sized guide, Mena Wang shows business leaders how to look beyond averages, use statistical tests, and weigh effect sizes before acting. The lesson: not every “significant” difference is worth millions in investment.New Case Studies from the Tech Titans 🚀💡🔸NVIDIA AI Releases Universal Deep Research (UDR): A Prototype Framework for Scalable and Auditable Deep Research Agents.NVIDIA’s Universal Deep Research (UDR) is a prototype framework that separates research strategy from the underlying LLM, making deep research flexible, auditable, and scalable. Unlike rigid model-bound tools, UDR lets users design custom workflows, enforce validation rules, and swap models. With templates like Minimal, Expansive, and Intensive, UDR enables transparent, cost-efficient research pipelines for science, enterprise, and startups.🔸GKE Inference Gateway and Quickstart are GA:Google Cloud is expanding its AIHypercomputerstack with new inference capabilities in GKE Inference Gateway, now generally available. Highlights include prefix-aware routing for up to 96% faster TTFT, disaggregated serving for 60% higher throughput, and Anywhere Cache for 4.9x faster model loads. Paired with GKE InferenceQuickstart, teams can benchmark,optimize, and deploy LLM inference stacks in days instead of months.🔸Announcing Dataproc multi-tenant clusters:Google Cloud is introducingDataprocmulti-tenant clusters, giving data science teams a shared notebook environment that balances efficiency with strong isolation. Instead of siloed resources or weak security, admins can map users to service accounts, enforce IAM policies, and scalecomputedynamically. WithJupyterintegration via Vertex AI Workbench or third-party setups, teams get faster collaboration, lower costs, and enterprise-grade control.🔸Exploring Merit Order and Marginal Abatement Cost Curve in Python:This tutorial shows how to use Python to model electricity pricing anddecarbonisation. First, it builds a merit order curve to show how different power plants, ordered by cost, set the market price. Then it introduces a Marginal Abatement Cost Curve to comparedecarbonisationoptions by cost and impact. The code includes interactive charts to explore scenarios easily.Blog Pulse: What’s Moving Minds 🧠✨🔸GibsonAI Releases Memori: An Open-Source SQL-Native Memory Engine for AI Agents.GibsonAIhas releasedMemori, an open-source SQL-native memory engine for AI agents. Instead of relying on costly, opaque vector databases, Memori uses standard SQL (SQLite, PostgreSQL, MySQL) to provide persistent, transparent, and auditable memory. With a single line of code, agents gain context retention across sessions, reducing redundancy, cutting infrastructure costs by up to 90%, and giving users full control over their data.🔸Introducing Conversational Commerce agent on Vertex AI:Google Cloud has launched theConversational Commerce agent, now generally available in Vertex AI, to help retailers meet the shift toward longer, more complex search queries. Powered by Gemini, it enables natural, back-and-forth shopping conversations that guide users from discovery to checkout. Early adopters like Albertsons are seeing customers add more items to their carts, boosting sales through smarter, more intuitive product discovery.🔸Automate app deployment and security analysis with new Gemini CLI extensions:Google just introduced two newGemini CLIextensions that bring security and deployment right into your terminal. With/security:analyze, you can scan code for vulnerabilities locally (and soon in GitHub PRs) with clear, actionable fixes.With/deploy, you can ship apps directly toCloud Runin one simple command.It’sthe start of a broader, extensible Gemini CLI ecosystem.🔸The Hungarian Algorithm and Its Applications in ComputerVision:TheHungarian algorithm, first developed in the 1950s, is a powerful way to solve assignment problems, optimally matching tasks to workers, or objects across video frames. In computer vision, it underpinsmulti-object trackingby minimizing distances between bounding boxes detected in consecutive frames. This ensures consistent object tracking, even in complex scenes with motion, occlusion, or overlapping detections.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

0
0

DataPro

Claude Code + Amazon Bedrock Prompt Caching, Mistral Code, Snowflake’s Cortex AISQL, Google Cloud’s Lightning Engine + Vertex AI Ranking API

Microsoft Presidio, Amazon Bedrock + Arize Phoenix for Agent Observability, No-Code Forecasting with SageMaker Canvas

Behind the Book: How Eric Narro’s Taipy Journey Began

10,000x Faster Bayesian Inference, OpenAI on Countering Malicious AI, MCP integrations to Google Cloud Databases, MLOps Pipeline with Tekton and Buildpacks

Packt Live: Algo Trading Workshop With Jason Strimpel

What’s powering AI’s next leap? LongCat Flash Omni, DeepAgent, SkyRL & more

Meta AI’s MapAnything, Google’s Data Science Agent, Agent Payments Protocol (AP2), Hugging Face Trackio, IBM’s Granite-Docling

GibsonAI Memori: SQL-Native Memory for Agents, NVIDIA’s Universal Deep Research, Conversational Commerce Agent on Vertex AI

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access