From multimodal LLMs to self-thinking agents, see what’s driving AI’s next leap.👋 Hello ,Welcome to DataPro #155 ➖Where Models Get Smarter, Agents Get Autonomous, and AI Gets Real-Time.This week’s edition explores the frontier of intelligent systems that see, reason, and act. From Meituan’s LongCat Flash Omni and DeepAgent’s unified reasoning to OpenAI’s gpt-oss-safeguard and SkyRL tx, AI is rapidly evolving toward autonomy, speed, and safety. We also look at how multimodal RAG, ethical AI, and data mesh are redefining how we build and scale intelligence.Knowledge Partner Spotlight: OutskillAt Packt, we’ve partnered with Outskill to help readers gain practical exposure to AI tools through free workshops, complementing the deeper, hands-on, expert-led experiences offered through Packt Virtual Conferences.If you're interested in enhancing your AI skills, Outskill’s LIVE 2-Day AI Mastermind offers a 16-hour training on AI tools, automations, and agent-building. This weekend’s sessions (Saturday and Sunday, 10 AM–7 PM EST) are available at no cost as part of their Black Friday Sale, providing a great opportunity to elevate your knowledge in just two days.Learn AI tools, agents & automations in just 16 hoursJoin now, limited free seats available!This week’s highlights:🔸LongCat Flash Omni:Meituan’s open 560B multimodal model for real-time interaction🔸DeepAgent: A unified reasoning agent that thinks, searches, and acts autonomously🔸SkyRL tx v0.1.0: Tinker-style reinforcement learning engine for local clusters🔸OpenAI gpt-oss-safeguard: Policy-conditioned safety reasoning models, open-weight and Apache 2.0🔸Does AI Need to Be Conscious to Care? Exploring the philosophy of artificial moral concern🔸Building Multimodal RAG: How to make retrieval truly visual and contextual🔸Covestro x Amazon DataZone: A blueprint for scaling data governance through data meshEach story in this issue unpacks a new layer in how AI learns, governs, and grows—so grab a coffee, settle in, and let’s dive into the full roundup.Cheers,Merlyn ShelleyGrowth Lead, PacktSponsored:🔸82% of data breaches happen in the cloud. Join Rubrik’s Cloud Resilience Summit to learn how to recover faster and keep your business running strong. [Save Your Spot]🔸Build your next app on HubSpot’s all-new Developer Platform,the flexible, AI-ready foundation to create, extend, and scale your integrations with confidence. [Start Building Today]Subscribe|Submit a tip|Advertise with UsTop Tools Driving New Research 🔧📊🔶 LongCat-Flash-Omni: A SOTA Open-Source Omni-Modal Model with 560B Parameters with 27B activated, Excelling at Real-Time Audio-Visual Interaction. Meituan’s LongCat Flash Omni is a 560B-parameter open-source multimodal model that activates 27B per token using shortcut-connected MoE. It extends text LLMs to vision, video, and audio with 128K context and real-time streaming through 1-second audio-visual interleaving at 2 fps duration-conditioned sampling. With modality-decoupled parallelism, it retains 90% text-only throughput and scores 61.4 on OmniBench, 78.2 on VideoMME, and 88.7 on VoiceBench, nearing GPT-4o performance.🔶 DeepAgent: A Deep Reasoning AI Agent that Performs Autonomous Thinking, Tool Discovery, and Action Execution within a Single Reasoning Process. Most agent frameworks still follow a fixed Reason–Act–Observe loop, but DeepAgent from Renmin University and Xiaohongshu redefines this with end-to-end deep reasoning. Built on a 32B QwQ backbone, it unifies thought, tool search, tool call, and memory folding within one stream. It dynamically retrieves tools from 16K+ APIs, compresses long histories into structured memories, and trains via Tool Policy Optimization (ToolPO) for precise tool use. DeepAgent achieves 69.0 on ToolBench and 91.8% success on ALFWorld, outperforming ReAct-style workflows in both labeled and open tool settings.🔶 Anyscale and NovaSky Team Releases SkyRL tx v0.1.0: Bringing Tinker Compatible Reinforcement Learning RL Engine To Local GPU Clusters. Anyscale and UC Berkeley’s NovaSky team released SkyRL tx v0.1.0, a local, Tinker-compatible engine that unifies training and inference for LLM reinforcement learning. It implements Tinker’s low-level API (forward_backward, optim_step, sample, save_state) and runs on user infrastructure. The update adds end-to-end RL, jitted sharded sampling, LoRA adapter support, gradient checkpointing, micro batching, and Postgres integration, enabling full RL training on 8×H100 GPUs with Tinker-level efficiency and open deployment.🔶 OpenAI Releases Research Preview of 'gpt-oss-safeguard': Two Open-Weight Reasoning Models for Safety Classification Tasks. OpenAI released gpt-oss-safeguard, two open-weight safety reasoning models, 120B and 20B parameters, that let developers enforce custom safety policies at inference time. Fine-tuned from gpt-oss and Apache 2.0 licensed, they replicate OpenAI’s internal Safety Reasoner used in GPT-5 and Sora 2. The models reason step by step on developer-supplied policies, outperform gpt-5-thinking on multi-policy accuracy, and fit on single-GPU setups for real moderation pipelines.Topics Catching Fire in Data Circles 🔥💬🔶 Does AI Need to Be Conscious to Care? This philosophical study explores that question through a precise framework. It distinguishes functional, experiential, and moral caring, showing that caring behaviors can exist without consciousness, as seen in bacteria, plants, and immune systems. While current AI systems display goal-directed, welfare-promoting behavior, they lack genuine concern. Consciousness-based and agency-based routes could both lead to artificial moral concern, suggesting caring exists on a spectrum. Future AI may combine conscious experience with robust agency, raising urgent ethical questions about artificial moral significance.🔶 Building a Multimodal RAG That Responds with Text, Images, and Tables from Sources. Retrieval-Augmented Generation (RAG) has long powered text-based chatbots, but extending it to images, tables, and graphs is far harder. Real documents, like research papers and corporate reports, mix text, formulas, and figures without consistent formatting, breaking the link between visuals and context. To fix this, a new multimodal RAG pipeline introduces context-aware image summaries using nearby text instead of isolated captions, and text-response-guided image selection, where visuals are chosen after the textual answer is generated. Together, these steps yield consistent, contextually grounded multimodal retrieval across complex documents.🔶 From Classical Models to AI: Forecasting Humidity for Energy and Water Efficiency in Data Centers. This blog explores how accurate humidity forecasting can improve the efficiency, reliability, and sustainability of AI data centers. It explains how temperature and humidity directly affect cooling systems, energy use, and water consumption, and presents a real-world case study using Delhi’s climate data. The post compares forecasting methods, AutoARIMA, Prophet, XGBoost, and deep learning, with prediction intervals to assess accuracy and uncertainty, aiming to identify the best tools for operational planning and environmental optimization in large-scale AI infrastructure.🔶 Scaling data governance with Amazon DataZone: Covestro success story. This blog explores how Covestro Deutschland AG reengineered its global data architecture by transitioning from a centralized data lake to a domain-driven data mesh using Amazon DataZone and the AWS Serverless Data Lake Framework (SDLF). The transformation empowered teams to manage data products independently while maintaining consistent governance, improving data sharing and visibility. Through AWS Glue, S3, and automated data quality checks, Covestro now operates over 1,000 standardized data pipelines, achieving faster delivery, stronger governance, and scalable analytics across the enterprise.New Case Studies from the Tech Titans 🚀💡🔶 How to design conversational AI agents? This blog explores how conversational AI is transforming the online shopping experience by replacing rigid keyword-based search with natural, intuitive interactions. It outlines seven key design principles for creating AI shopping agents that understand user intent, personalize recommendations, support multimodal input, and present rich visuals. The post also highlights best practices for building user trust, handling ambiguity gracefully, and leveraging Google Cloud’s Conversational Commerce tools and Figma’s component library to design adaptable, on-brand, and intelligent shopping experiences.🔶 How 5 agencies created an impossible ad with Gemini 2.5 Pro? Generative AI is rewriting the rules of creativity. With Gemini 2.5 Pro and Google’s suite of generative media models, Imagen, Veo, Lyria, and Chirp, brands are moving beyond traditional campaigns to design what was once impossible. From Slice’s AI-powered retro radio station and Virgin Voyages’ personalized “postcards from your future self,” to Smirnoff’s interactive party co-host and Moncler’s cinematic AI film, these projects show how imagination and technology now merge to create entirely new forms of storytelling and brand expression.🔶 Build intelligent ETL pipelines using AWS Model Context Protocol and Amazon Q: Building and maintaining ETL pipelines has long been one of the most time-consuming parts of data engineering. With conversational AI and Model Context Protocol (MCP) servers, teams can now automate much of that process, turning complex scripting into guided, natural language interactions. By integrating with AWS services like Redshift, S3 Tables, and Glue, organizations can generate, test, and deploy pipelines faster while preserving security and governance standards. This post demonstrates how data scientists and engineers can use conversational AI to extract data, validate quality, and automate end-to-end migrations from Redshift to S3, reducing manual effort, improving accuracy, and accelerating insight generation.🔶 Amazon Kinesis Data Streams launches On-demand Advantage for instant throughput increases and streaming at scale: Managing real-time data streams just became simpler and more cost-efficient with the launch of Amazon Kinesis Data Streams On-demand Advantage mode. This new capability introduces warm throughput for instant scalability during traffic spikes and a committed-usage pricing model that significantly lowers costs for steady, high-volume workloads. Designed for use cases ingesting at least 10 MiB/s or operating hundreds of streams per region, it eliminates the need to manually switch between capacity modes. The post explains how On-demand Advantage helps organizations handle predictable surges, optimize costs, and configure warm throughput up to 10 GiB/s, along with setup steps, pricing details, and best practices for maintaining high-performance streaming pipelines.Blog Pulse: What’s Moving Minds 🧠✨🔶 The Pearson Correlation Coefficient, Explained Simply: Understanding how variables move together is the foundation of predictive modeling. In this walkthrough, we explore how to calculate and interpret the Pearson correlation coefficient, a key step before fitting a regression model. Using a simple salary dataset with Years of Experience and Salary, the post explains how to visualize relationships with scatter plots, compute variance, covariance, and standard deviation, and finally derive the correlation coefficient. With a result of r = 0.9265, the example shows a strong positive linear relationship, confirming that simple linear regression is well suited for predicting salary based on experience.🔶 Graph RAG vs SQL RAG: Comparing how large language models reason over structured and connected data reveals valuable insights into retrieval-augmented systems. In this experiment, a Formula 1 results dataset was stored in both a SQL and a graph database, then queried using retrieval-augmented generation (RAG) with models like GPT-3.5, GPT-4, and GPT-5. Each model translated natural language into SQL or graph queries to answer questions about drivers, races, and championships. The results show that newer models like GPT-5 achieved near-perfect accuracy across both databases, while simpler models struggled more with graph data. The study concludes that RAG-equipped LLMs can reason reliably over either database type, letting teams choose whichever structure best fits their data without sacrificing performance.🔶 RF-DETR Under the Hood: The Insights of a Real-Time Transformer Detection. Object detection has come a long way from rigid anchor grids to adaptive Transformer architectures. RF-DETR, Roboflow’s latest real-time detection model, embodies that evolution. Building on DETR’s end-to-end design, Deformable DETR’s adaptive attention, and LW-DETR’s lightweight efficiency, RF-DETR fuses these innovations with a DINOv2 self-supervised backbone for domain adaptability and speed. The result is a model that achieves real-time performance without sacrificing accuracy, capable of both bounding box detection and segmentation. In essence, RF-DETR showcases how adaptive attention and self-supervised vision have made Transformers fast, flexible, and production-ready for modern computer vision tasks.🔶 Building secure Amazon ElastiCache for Valkey deployments with Terraform. Managing infrastructure through code is becoming essential for secure, scalable cloud deployments. Using Infrastructure as Code (IaC) with Terraform, this guide walks through building a secure Amazon ElastiCache for Valkey cluster, covering both serverless and node-based options. It demonstrates how IaC ensures consistent configurations for encryption, authentication, and network isolation across environments. The walkthrough details step-by-step deployment, from provisioning private subnets and KMS-encrypted storage to implementing token-based authentication and CloudWatch logging. The result is a reproducible, production-grade ElastiCache setup that combines automation, security, and cost efficiency through a modern Terraform workflow.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

By Saurabh ShrivastavaCloudPro #11082% of data breaches happen in the cloudThe reality is you can’t stop every single attack so survival depends on how fast you can recover.Join us for the Cloud Resilience Summit on December 10th to:- Build true cyber resilience by shifting to an “assume breach” strategy- Gain practical, real-world cloud insights- Ensure rapid business recovery and minimal financial impact with a cloud restoration strategySave My SpotThis week’s CloudPro Special comes from Saurabh Shrivastava, Global Solutions Architect Leader at AWS and author of the bestselling Solutions Architect’s Handbook. With over two decades in the industry, Saurabh has helped shape how enterprises build and secure cloud systems.And in today's article, he explores a radical idea: AI that runs entirely offline. No APIs, no data leaving your network. Just private, local intelligence built for sensitive environments. Sounds interesting? Read on for the full article.If you want to learn directly from him, Saurabh is hosting a live AWS Solutions Architect Associate (SAA-C03) Workshop on January 17. Its a hands-on, fast-paced session that strips the exam down to what really matters. CloudPro readers get an exclusive 40% early-bird discount with the code CLOUDPRO. Reserve your seat.Cheers,Shreyans SinghEditor-in-ChiefEarly Bird Offer: Get 40% OffUse code CLOUDPROAI That Runs Entirely Offline: How to Build an Offline Enterprise AssistantBy Saurabh ShrivastavaWorking in defense, finance, law, or a heavily regulated industry means you can't just plug into ChatGPT and call it a day. Cloud-based AI tools aren't built for environments where data leakage isn't just bad.It's catastrophic.You can't send classified intel or proprietary financial models to someone else's servers. And if you're operating in an air-gapped network? Forget about it.That's the problem this Offline Enterprise Assistant solves.It's a local AI setup that runs entirely on your own hardware. No cloud dependencies. No API keys. No data leaving your perimeter. You choose a model: LlamaCpp, Ollama, whatever fits your needs, and run it directly on your machine. Every prompt, every response, every log file stays inside your infrastructure.This matters when you're reviewing sensitive legal contracts, running R&D analyses, or automating workflows that involve confidential information. You get the productivity boost of modern AI without opening the door to external risk. It's built for teams that need full control over their tools and can't afford to trust a third party with their data.Why This Architecture Stands OutRuns Without Internet: Operates 100% offline, making it ideal for air-gapped networks or classified infrastructure.Keeps Data on Your Device: Nothing is sent out, nothing is tracked. You stay in control always.Fast and Responsive: Local inference means no lag, no rate limits, just amazing performance.Built for Sensitive Workflows: Legal reviews, research, compliance, internal tooling are all handled securely.Most teams are realizing that AI doesn’t always belong in the cloud. When you’re dealing with internal systems, sensitive data, or strict compliance rules, you need something that stays inside your walls. That’s where a local-first approach makes sense: it gives you the benefits of AI without the exposure.This Offline Enterprise Assistant is built around that idea. It’s your own assistant, running entirely on your hardware, tuned to your environment, and never sending a single request outside your network. You control how it works, how it’s updated, and what data it touches.Let’s break down how the architecture fits together.Architecture ExplanationThe offline MCP Client architecture is designed to deliver end‑to‑end private and local AI capability, without any reliance on cloud APIs or outbound network traffic. Here’s how it works:Developer: Prepares prompts or workflows using a local development environment (such as a secure IDE or terminal). All interactions originate and remain on the local device.MCP Client: Acts as the interface between the developer’s inputs and the AI model. It routes prompts to the embedded LLM, orchestrates the workflow, and handles results.Offline LLMs (LlamaCpp / Ollama): Powerful large language models are loaded and executed directly on the local hardware. No external API calls; all model inference and response generation happen on the device, fully offline.Local SQLite Database: Stores chat logs, prompts, and results securely and privately. Provides an audit trail and the ability to revisit past interactions, entirely within the local infrastructure.Secure UI/API: Presents results to the developer via a local web interface or terminal UI. Enables further integration with internal systems while ensuring data never leaves the trusted environment.Think about it. You don’t want your data, your prompts, or your workflows slipping out into the cloud. With this architecture, nothing leaves your machine.Zero external exposure. No tokens. No API keys. No hidden traffic.If you’re in aregulated industry, whether it’s defense, legal, healthcare, or any air-gapped environment, this setup checks every box. It keeps you compliant, private, and secure while still giving you the power of modern AI. And here’s the best part: it’sextensible by design.Want to add another LLM? Done.Need to customize workflows? Easy.Ready to experiment with agentic AI? Go ahead. You can build without ever breaking the privacy barrier.Most importantly, this isn’t a short-term solution. It’sfuture-proof. As on-device AI models become larger and smarter, this architecture will scale with you, handling more automation, more intelligence, and more complexity.Now it’s time to get our hands dirty and implement it.ImplementationUsing LM Studio, Streamlit, and Python, you’ll set up and run local open-source models directly on your machine. Unlike online AI assistants like ChatGPT or Google Bard, which constantly need internet connectivity and send data back to external servers, this approach runs completely offline.Along the way, you’ll gain hands-on experience with the full cycle: you’ll understand howlocal LLMsreally work, set up all the required software and dependencies, download and run an open-source model in LM Studio, and then build asimple yet powerful chat interfaceusing Streamlit. From there, you’ll integrate your local LLM into the Streamlit app and learn how to store and review chat historyusing a local database securely. By the end, you’ll have aBefore you dive into building your offlineEnterprise Assistant, it’s important to get familiar with a few key concepts.At the heart of this setup is the Offline Assistant itself: an AI system that runs entirely on your computer, performing all language model inference locally without ever needing an internet connection.Powering this is an LLM (Large Language Model), a type of AI trained on massive datasets to generate human-like text responses.To make it simple to use, you’ll rely on LM Studio, a desktop app that lets you download, run, and serve open-source LLMs on your machine, exposing them through a local API.For the interface, you’ll use Streamlit, a Python framework that makes it easy to build interactive web apps and quickly prototype AI-driven tools.And finally, for securely managing chat history, you’ll work with SQLite, a lightweight local database that keeps all your interactions private and fully stored on your device.By the end of this hands-on exercise, you’ll have your ownlocal Enterprise Assistantrunning directly in your browser—powered by an open-source LLM that operates fully offline throughLM Studio. You’ll interact with it using a simple but effective interface built withStreamlit, making your assistant practical and easy to use.Most importantly, every conversation will be securely stored as local chat logs in your system, never sent to the cloud, never exposed. By the time you’re done, you’ll walk away with a private, offline AI assistant that runs fast and stays entirely under your control.Demo Video and RepoLab guideConclusionCongratulations! You’ve just built your very own offline Enterprise Assistant, powered entirely by open-source tools and running fully on your machine. Along the way, you learned how to set up LM Studio to run an LLM locally, how to create a lightweight but effective interface with Streamlit, and how to store all your conversations securely using SQLite. Most importantly, you now understand how to put privacy first, keeping every prompt, response, and workflow under your complete control, with no reliance on external servers or cloud APIs.This hands-on exercise gave you more than just a working prototype. You gained insight into how local LLMs work, how to integrate them into real-world applications, and how to design AI tools that balance functionality with security. You’ve also seen the bigger picture: how on-device AI can reshape the way enterprises approach sensitive tasks, from R&D to legal reviews to compliance-heavy workflows.But this is only the beginning. You can now extend your Enterprise Assistant with advanced features:Add asmarter UIwith more interactive elements.Try outdifferent open-source modelsto experiment with speed, accuracy, and capabilities.Layer inanalytics and insightsto track and optimize your usage.Even push towardsagentic AI, giving your assistant the ability to automate tasks and workflows while still running securely offline.With what you’ve built, you’ve proven that you can harness the power of Generative AI without compromise: no data leaks, no internet dependency, no loss of control.Your private AI journey starts here.- SaurabhSponsored:Build your next app on HubSpot with the flexibility of an all-new Developer Platform.The HubSpot Developer Platform gives you the tools to build, extend, and scale with confidence. Create AI-ready apps, integrations, and workflows faster with a unified platform designed to grow alongside your business.Start Building Today📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us.If you have any comments or feedback, just reply back to this email.Thanks for reading and have a great day! *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}

What’s powering AI’s next leap? LongCat Flash Omni, DeepAgent, SkyRL & more

AI That Runs Entirely Offline: How to Build an Offline Enterprise Assistant

WebDevPro #116: Next.js 16 accelerates full-stack builds, Rust tightens safety, and Django goes async

#222: Digging into Social Engineering, part 2

🚀 Your Next LLM May Be Trained on 4 Bits—Thanks to Nvidia

MobilePro #196: Mission Possible—The Model Context Protocol

Our Newsletters

_MobilePro

_SysAdminPro

_ProgrammingPro

_WebDevPro

_DataPro

_CloudPro

AI_Distilled

Attack & Defend

_SecPro

AlgoFinance

_BI-Pro

PythonPro

SalesforcePulse

Featured Issues