Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds

DataPro

53 Articles
Merlyn from Packt
06 Sep 2024
13 min read
Save for later

🌠 Llama-3.1-Storm-8B, CausalLM/miniG, RAG pipelines with LlamaIndex and Amazon Bedrock, Claude for Enterprise \ Anthropic, Concrete ML

Merlyn from Packt
06 Sep 2024
13 min read
Custom Tokenizer with Hugging Face Transformers, Multi-Agent Chat Application Using LangGraph @media only screen and (max-width: 100%;} #pad-desktop {display: none !important;} }Live Webinar: The Power of Data Storytelling in Driving Business Decisions (September 10, 2024 at 9 AM CST)Data doesn’t have to be overwhelming. Join our webinar to learn about Data Storytelling and turn complex information into actionable insights for faster decision-making.Click below to check the schedule in your time zone and secure your spot. Can't make it? Register to get the recording instead.REGISTER FOR FREESponsoredHappy Friday! 🌟Welcome to DataPro #110—Your Ultimate Data Science & ML Update! 🚀In the world of AI and ML, sharp reasoning is the key to smarter decisions and impactful leadership. Our latest insights and strategies will help you boost model accuracy, optimize performance, and cut costs with scalable solutions. Dive in for cutting-edge tips and real-world techniques to elevate your data game.📚 Book Haven: Top Reads & Author Insightsâ—œ"Data Science for Decision Makers": Elevate your leadership with data science and AI prowess by Jon Howells.â—œ"Data Science for IoT Engineers": Unlock data science techniques and ML applications for innovative IoT solutions by P. G. Madhavan.â—œ"Bash for Data Scientists": Master shell scripting for data science tasks with Oswald Campesato.â—œ"Angular and Machine Learning Pocket Primer": Get the essentials on integrating ML with Angular, also by Oswald Campesato.â—œ"AI, ML, and Deep Learning": Explore advanced AI techniques with Oswald Campesato’s practical guide.🔍 Model Breakdown: Algorithm of the Weekâ—œCustom Tokenizers for Non-English Languages: Dive into Hugging Face Transformers for multilingual models.â—œConcrete ML Privacy: Secure end-to-end privacy in model training and inference.â—œMultilingual Multi-Agent Chat with LangGraph: Build diverse language chat applications.â—œApproximating Stochastic Functions: Techniques for multivariate output functions.đŸȘTrendspotting: Hot Tech Trendsâ—œLegal Reasoning Engines: How reasoning drives legal arguments.â—œR Clinical Flowcharts with shinyCyJS: Use R for clinical flowcharting.â—œClaude for Enterprise: Explore Anthropic's latest.â—œIBM Quantum Update: Qiskit SDK v1.2 release news!đŸ› ïž Platform Showdown: ML Tools & Servicesâ—œFastAPI for ML Web Apps: Build powerful web apps with FastAPI.â—œDetoxBench: Benchmarking large language models for fraud and abuse detection.â—œLlama-3.1-Storm-8B & CausalLM/miniG: New Hugging Face models.â—œBuild RAG Pipelines: Combine LlamaIndex with Amazon Bedrock for robust pipelines.📊 Success Stories: ML in Actionâ—œEcommerce Data Quality: Strategies for improving data quality.â—œEssential Python Modules: Must-know Python modules for data engineers.â—œAvoiding Data Science Mistakes: Tips to steer clear of common pitfalls.â—œThomson Reuters Labs: Accelerating AI/ML innovation with AWS MLOps.â—œGalxe & AlloyDB: Cost-cutting success story.🌍 ML Newsflash: Industry Buzz & Discoveriesâ—œGPT-4 for Customer Service: Redefining standards with GPT-4.â—œHYGENE: A novel diffusion-based hypergraph generation method.â—œYi-Coder: Meet a compact yet powerful LLM for code.â—œGuided Reasoning: New approaches to enhance multi-agent system intelligence.Enjoy the newsletter and have a fantastic weekend! ✹DataPro Newsletter is not just a publication; it’s a complete toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copyand start transforming your data expertise today!Calling Data & ML Enthusiasts!Want to share your insights and build your online reputation? Contribute to our new Packt DataPro column! Discuss tools, share experiences, or ask questions. Gain recognition among 128,000+ data professionals and boost your CV. Simply reply with your Google Docs link or use our feedback form. Whether you’re looking for visibility or a discreet approach, we’re here to support you.Share your content today and engage with our vibrant community! We’re excited to hear from you!Take our weekly survey and get a free PDF copy of our best-selling book,"Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share Your Insights and Shine! 🌟💬200+ hours of research on AI-led career growth strategies & hacks packed in 3 hoursThe only AI Crash Course you need to master 20+ AI tools, multiple hacks & prompting techniques in just 3 hoursYou’ll save 16 hours every week & find remote jobs using AI that will pay you upto $10,000/moRegister & save your seat now (100 free seats only)Sponsored📚 Book Haven: Must-Reads & Author InsightsDid you know? “Books are the quietest, most constant friends, holding the world’s treasured wisdom. They offer gentle guidance and timeless lessons, passing their rich inheritance from one generation to the next.”We’re thrilled to bring you this week’s must-have new releases, straight from the experts to your bookshelf! Whether you're eager to enhance your skills or explore new horizons, now is the perfect moment to add these invaluable resources to your collection.For a limited time,enjoy 30% off all eBooks at Packtpub.com. These books are thoughtfully crafted by industry insiders with hands-on experience, offering unique insights you won’t find anywhere else.Don’t let these Packt-exclusive deals slip away—seize the opportunity to learn from the best at an unbeatable price!Order Today at $24.99 $35.99Data Science for Decision Makers: Enhance your leadership skills with data science and AI expertiseBy Jon HowellsStruggling to bridge the gap between data science and business leadership? Our new book is here to help!What you’ll gain:✔ Master statistics and ML to interpret models and drive decisions.✔ Identify AI opportunities and oversee data projects from start to finish.✔ Empower teams to tackle complex problems and build AI solutions.Elevate your leadership and make data work for you! Get the book now—just $24.99, down from $35.99!Order Today at $34.98$49.99Data Science for IoT Engineers: Master Data Science Techniques and Machine Learning Applications for Innovative IoT SolutionsBy Mercury Learning and Information, P. G. MadhavanDive into our new book, crafted for engineers, physicists, and mathematicians eager to bridge the gap between theory and practice!What’s inside:✔ Integrate systems theory and machine learning seamlessly.✔ Apply practical solutions like digital twins to real-world problems.✔ Progress from basics to advanced techniques with ease.Whether you're tackling IoT challenges or modeling complex systems, this workbook with MATLAB code will guide you every step of the way. Get the eBook now for just $34.98, down from $49.99! Elevate your skills and tackle IoT and complex systems with confidence.Order Today at $37.99$54.99Bash for Data Scientists: A Comprehensive Guide to Shell Scripting for Data Science TasksBy Mercury Learning and Information, Oswald CampesatoUnlock the power of Bash for your data science projects with our latest book!What’s inside:✔ Master Bash for efficient data processing with practical, real-world examples.✔ Learn to integrate with Pandas and databases for advanced data handling.✔ Get hands-on with grep, sed, and awk to clean and manage datasets effectively.Grab the eBook now for just $37.99, originally $54.99! Elevate your scripting skills and streamline your data tasks today!Order Today at $27.98$39.99Angular and Machine Learning Pocket Primer: A Comprehensive Guide to Angular and Integrating Machine LearningBy Mercury Learning and Information, Oswald CampesatoReady to elevate your Angular apps with machine learning? Our latest Pocket Primer has you covered!What’s inside:✔ Seamless integration of Angular and machine learning using TensorFlow.js and Keras.✔ Practical, step-by-step tutorials and real-world examples.✔ Comprehensive coverage of Angular basics, UI development, and machine learning models.Get the eBook now for just $27.98, originally $39.99! Transform your skills and build sophisticated applications with ease.Order Today at $41.98$59.99Artificial Intelligence, Machine Learning, and Deep Learning: A Practical Guide to Advanced AI TechniquesBy Mercury Learning and Information, Oswald CampesatoDiscover the world of AI with our new book, perfect for expanding your skills from basics to advanced techniques!What’s inside:✔ In-depth coverage of AI, machine learning, and deep learning.✔ Practical examples and hands-on tutorials with Keras, TensorFlow, and Pandas.✔ Explore classifiers, deep learning architectures, NLP, and reinforcement learning.Get the eBook now for just $41.98, down from $59.99! Transform your understanding and apply these cutting-edge concepts in real-world scenarios.🔍 Model Breakdown: Unveiling the Algorithm of the Week➜ How to Create a Custom Tokenizer for Non-English Languages with Hugging Face Transformers? This blog explains the importance of tokenization in NLP and provides a detailed guide on training a custom tokenizer for non-English languages using Hugging Face libraries, ensuring improved model performance for diverse datasets.➜ End-to-end privacy for model training and inference with Concrete ML: This blog explores how to achieve end-to-end privacy in collaborative machine learning using federated learning and fully homomorphic encryption (FHE). It details a demo with scikit-learn and Concrete ML for secure model training and inference.➜ Building a Multilingual Multi-Agent Chat Application Using LangGraph: This blog details the development of a multilingual chat application to bridge language barriers in workplaces. It covers building features using LangChain and LangGraph, including agent design, translation workflows, and deployment with FastAPI.➜ Approximating Stochastic Functions with Multivariate Outputs: The article describes an enhanced method for training generative machine learning models, named Pin Movement Training (PMT). It extends the original PMT, which approximated single-output stochastic functions, to handle multiple-output functions. The approach uses a neural network and a hypersphere-based Z-space to map and approximate multidimensional outputs, like autoencoders but with uniform sampling for better results.Developing for iOS? Setapp's 2024 report on the state of the iOS market in the EU is a must-seeHow do users in the EU find apps? What's the main source of information about new apps? Would users install your app from a third-party app marketplace?Set yourself up for success with these and more valuable marketing insights in Setapp Mobile's report iOS Market Insights for EU.Get Insights freeSponsored🚀 Trendspotting: What's Next in Tech Trends➜ Reasoning as the Engine Driving Legal Arguments: The article explores how tribunals assess evidence in legal cases, focusing on three key stages: determining evidence relevance, evaluating trustworthiness, and weighing competing evidence. It highlights the role of "reasoning sentences" in explaining decision-making and discusses machine learning techniques for identifying these sentences in legal documents.➜ Use R to build Clinical Flowchart with shinyCyJS: The blog discusses creating Clinical Flowcharts for visualizing clinical trials, focusing on various methods, particularly using R. It details challenges and solutions in drawing flowcharts, including software limitations and customizations with shinyCyJS for precise visual representation.➜ Claude for Enterprise \ Anthropic: The Claude Enterprise plan now offers enhanced features for secure collaboration, including a 500K context window, GitHub integration, and advanced security measures. This allows teams to leverage internal knowledge while safeguarding data.➜ IBM Quantum Computing - Release news: Qiskit SDK v1.2 is here! Qiskit SDK v1.2 introduces major updates, including Rust-based circuit infrastructure for faster performance, improved synthesis and transpilation, and new features. It also ends support for Python 3.8, requiring Python 3.9 or later. đŸ› ïž Platform Showdown: Comparing ML Tools & Services➜ Using FastAPI for Building ML-Powered Web Apps: This tutorial demonstrates building a machine learning web app using FastAPI and Jinja2 templates. It covers creating a prediction API for a Random Forest model and integrating it with a web interface for user interaction.➜ DetoxBench: Benchmarking large language models for multitask fraud & abuse detection. This paper introduces a benchmark suite to evaluate large language models (LLMs) for detecting and mitigating fraud and abuse in various real-world scenarios, highlighting performance gaps and offering a tool for improving LLMs in high-stakes applications.➜ Llama-3.1-Storm-8B · Hugging Face: The Llama-3.1-Storm-8B model outperforms Meta’s Llama-3.1-8B-Instruct and Hermes-3 across multiple benchmarks. It improves instruction-following, QA, reasoning, and function-calling via self-curation, fine-tuning, and model merging techniques.➜ CausalLM/miniG · Hugging Face: The miniG model has two versions: standard and "alt," the latter trained with masked context to improve stability. Trained on a large dataset with text and image support, it performs best with Hugging Face Transformers for minimal performance degradation.➜ Build powerful RAG pipelines with LlamaIndex and Amazon Bedrock: This blog explores using Retrieval Augmented Generation (RAG) techniques to enhance large language models (LLMs) by integrating external knowledge sources. It discusses building advanced RAG pipelines with LlamaIndex and Amazon Bedrock, covering topics like query routing, sub-question handling, and stateful agents.📊 Success Stories: Real-World ML Case Studies➜ Improving ecommerce data quality: This blog details how Lowe’s enhanced its website search accuracy by fine-tuning OpenAI’s GPT-3.5 model. By applying advanced prompt engineering, Lowe’s improved product data quality, reduced associate workload, and achieved a 20% accuracy boost in product tagging.➜ 10 Built-In Python Modules Every Data Engineer Should Know: This article highlights essential Python modules for data engineering, including tools for file management, data serialization, database interaction, and text processing. It covers how modules like `os`, `pathlib`, `shutil`, and `csv` can enhance data engineering tasks.➜ 5 Common Data Science Mistakes and How to Avoid Them: This blog outlines five common mistakes in data science projects, such as unclear objectives, neglecting basics, poor visualizations, lack of feature engineering, and overemphasizing accuracy. It offers practical solutions to avoid these pitfalls and improve project outcomes.➜ How Thomson Reuters Labs achieved AI/ML innovation at pace with AWS MLOps services? This post details how Thomson Reuters Labs developed a standardized MLOps framework using AWS SageMaker to streamline ML processes. It highlights the creation of TR MLTools and MLTools CLI to enhance efficiency, standardize practices, and accelerate AI/ML innovation.➜ Galxe migrates to AlloyDB for PostgreSQL, cutting costs by 40%: This blog explains how Galxe is addressing Web3 challenges by using AlloyDB for PostgreSQL and Google Cloud services. It highlights Galxe's innovations in decentralized identity, gamified user experiences, and scalable infrastructure to enhance Web3 adoption and performance.🌍 ML Newsflash: Latest Industry Buzz & Discoveries➜ Using GPT-4 to deliver a new customer service standard: Ada, valued at $1.2B with $200M in funding, is leading a $100B shift in customer service with its AI-native automation platform. Since its 2016 inception, Ada has doubled resolution rates using OpenAI’s GPT-4, achieving up to 80% resolution and setting new industry standards for effectiveness.➜ HYGENE: A Diffusion-based Hypergraph Generation Method. The paper introduces HYGENE, a diffusion-based method for generating realistic hypergraphs. Using a bipartite representation, it iteratively expands nodes and hyperedges through a denoising process, effectively modeling complex hypergraph structures. This is the first deep learning approach for hypergraph generation.➜ Meet Yi-Coder: A Small but Mighty LLM for Code. Yi-Coder is an open-source series of coding-focused LLMs, available in 1.5B and 9B parameter sizes. It offers advanced coding performance with up to 128K token context modeling, surpassing models like CodeQwen1.5 and DeepSeek-Coder, and excels in benchmarks such as LiveCodeBench and HumanEval.➜ Guided Reasoning: A New Approach to Improving Multi-Agent System Intelligence. Gregor Betz from Logikon AI introduces Guided Reasoning, a multi-agent system where a guide agent helps client agents improve their reasoning through structured methods. This approach, using argument maps and pros/cons evaluations, aims to enhance clarity and accuracy in AI decision-making and explanations.See you next time! *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}} @media only screen and (max-width: 100%;} #pad-desktop {display: none !important;} }
Read more
  • 0
  • 0
  • 5846

Merlyn from Packt
12 Sep 2024
11 min read
Save for later

🌐 IBM's PowerLM-3B & PowerMoE-3B models, Apple’s Byte-Level ASR Optimization, AtScale’s Open-Source Semantic Modeling Language, LG’s EXAONEPath

Merlyn from Packt
12 Sep 2024
11 min read
Google’s AI detective, Regnology Automates Ticket-to-Code with agentic GenAI on Vertex AI, MedFuzz @media only screen and (max-width: 100%;} #pad-desktop {display: none !important;} }Grow your business & career by 10x using AI Strategies in 4 hrs! đŸ€ŻJoin GrowthSchool's AI Business Growth & Strategy Crash Course and discover how to revolutionise your approach to business on 12th September at 10 AM EST.In just 4 hours, you’ll gain the tools, insights, and strategies to not just survive, but dominate your market.This is more than just a workshop—it's a turning point.The first 100 to register get in for FREE. Don’t miss the chance to change your business trajectory forever.Sign up here to save your seat! 👈SponsoredWelcome to DataPro #111—Your Weekly Dose of Data Science & ML Magic! 🚀We’re now landing in your inbox every Thursday to keep you sharp and ahead of the game!In the ever-evolving realm of AI and ML, it's all about harnessing smart insights for impactful decisions and stellar leadership. Dive into our new Packt Signature Series, where you'll find expert tips on everything from real-time data management to mastering AI modeling. We’re here to equip you with the tools you need to navigate the data world like a pro.This week, we’ve got cutting-edge strategies to boost your model accuracy, optimize performance, and reduce costs with scalable solutions. Get ready for top-notch tips and practical techniques to supercharge your data skills.📚 Top Reads & Author Insights:✩ Building AI Intensive Python Applications:Dive deep into advanced AI apps.✩ Databricks ML in Action: Real-world applications and best practices.✩ Generative AI Application Integration Patterns:Innovative uses of generative AI.✩ Polars Cookbook:Essential recipes for efficient data handling.✩ Building LLM Powered Applications:Building with large language models.✩ Building Data-Driven Applications with LlamaIndex:Leveraging LlamaIndex for robust applications.✩ Data Quality in the Age of AI:Ensuring top-notch data quality.✩ Modern Computer Vision with PyTorch - Second Edition:Updated techniques in computer vision.✩ Accelerate Model Training with PyTorch 2.X:Speed up your model training.✩ Mastering PyTorch - Second Edition:The ultimate guide to mastering PyTorch.🔍 Algorithm Spotlight:✩ Apple’s Byte-Level ASR Optimization: A new AI algorithm for speech recognition.✩ IBM’s PowerLM-3B & PowerMoE-3B: Massive language models with advanced scheduling.✩ AtScale’s Open-Sourced SML: Transforming analytics with a new semantic modeling framework.✩ LG’s EXAONEPath: Enhancing histopathology analysis with a pre-trained model.🚀 Tech Trendwatch:✩ Tracing Memory Allocation in Python: Learn how to track memory usage.✩ Anomaly Detection in Streaming Data: Using Amazon Managed Service for Apache Flink.đŸ› ïž ML Tool Showdown:✩7 Free Cloud IDEs You Need: Explore top IDEs for data science.✩ End-to-End Data Science Pipelines: From ingestion to visualization.✩ Sustainable MLOps: Optimizing operations for sustainability.📊 Success Stories:✩ GraphRAG’s Auto-Tuning: Adapting rapidly to new domains.✩ Enterprise Data Quality Guide: Navigating enterprise data challenges.✩ AI Agents for Daily Tasks: Automating routine app tasks.🌍 ML Newsflash:✩ Google’s AI Detective: Solving challenges with Gemini 1.5 Pro.✩ Regnology’s Gen AI on Vertex AI: Automating ticket-to-code processes.✩ MedFuzz on LLM Robustness: Evaluating LLMs in medical contexts.Stay tuned for your weekly dose of data brilliance! 🚀Take our weekly survey and get a free PDF copy of our best-selling book, "Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share Your Insights and Shine! 🌟💬📚 Packt Signature Series: Must-Reads & Author InsightsStep into a world of expert-driven knowledge with ourone-of-a-kindin-house content, crafted by industry pros to deliver the freshest insights on the latest tech releases. Discover how these cutting-edge titles are shaping the data landscape and unlocking the "whats," "hows," and "whys" behind emerging technologies. Whether you're looking to sharpen your skills or dive into something entirely new, there's never been a better time to expand your library with these essential resources.For a limited time, enjoy 30% off all eBooks at Packtpub.com. These books are more than just guides, they’re packed with real-world expertise from those who know the industry inside and out, offering perspectives you simply won’t find anywhere else.➜ Building AI Intensive Python ApplicationsThis book guides you through building powerful AI applications using large language models (LLMs), vector databases, and Python frameworks. You'll learn how to optimize AI performance, implement advanced techniques like retrieval-augmented generation, and tackle challenges like hallucinations and data leakage, ultimately creating reliable, high-impact AI solutions.Order Today at $41.98 $59.99➜ Databricks ML in ActionThis book is all about mastering the Databricks platform for machine learning and data science. It helps data engineers and scientists solve key problems by offering practical, cloud-agnostic examples and code projects. You’ll learn how to use Databricks tools to streamline workflows, improve model performance, and integrate with third-party apps.Order Today at $24.99 $35.99➜ Generative AI Application Integration PatternsThis book guides you through designing and integrating GenAI applications. You’ll learn essential tools and strategies, from prompt engineering to advanced techniques like retrieval-augmented generation. It provides practical examples, a clear 4-step framework, and covers ethical considerations for deploying GenAI models effectively.Order Today at $27.98 $39.99➜ Polars CookbookThis cookbook is your go-to guide for mastering Python Polars, a high-performance library for efficient data analysis. It offers step-by-step recipes for handling large datasets, advanced querying, and performance optimization. With practical tips on data manipulation, integration, and deployment, you'll boost your data workflows and analysis skills.Order Today at $24.99 $35.99➜ Building LLM Powered ApplicationsThis book helps you integrate LLMs into real-world apps using LangChain for orchestration. It covers the basics and advanced techniques of prompt engineering, explores various LLM architectures, and guides you through using powerful tools to create intelligent agents. You'll also learn about ethical considerations and the future of large foundation models.Order Today at $27.98 $39.99➜ Building Data-Driven Applications with LlamaIndexThis guide explores Generative AI and LlamaIndex, focusing on overcoming LLM limitations and building interactive applications. Learn to manage text chunking, security, and real-time data challenges. With hands-on projects, you'll master data ingestion, indexing, querying, and deployment, equipping you to develop and customize sophisticated AI-driven solutions.Order Today at $24.99 $35.99➜ Data Quality in the Age of AIThis book emphasizes the crucial role of data quality in AI success. It provides strategies to improve and measure data quality, offering practical steps to enhance data-driven decision-making. With real-world examples and actionable insights, it equips teams to optimize their data culture, leading to better AI performance and business outcomes.Order Today at $55.98 $79.99➜ Modern Computer Vision with PyTorch - Second EditionThis book offers a deep dive into neural network architectures and PyTorch for computer vision tasks. Learn to build solutions for image classification, object detection, and more using state-of-the-art models like CLIP and Stable Diffusion. With code available on GitHub and Google Colab, you'll gain practical skills for real-world applications and production deployment.Order Today at $33.99 $48.99➜ Accelerate Model Training with PyTorch 2.XThis book helps you optimize PyTorch model training, focusing on reducing build time and improving efficiency. Learn to speed up training with multicore systems, multi-GPU setups, and mixed precision. You'll explore techniques for model simplification, specialized libraries, and data pipeline improvements to enhance performance and model quality.Order Today at $24.99 $35.99➜ Mastering PyTorch - Second Edition This book guides you through building advanced neural network models with PyTorch, including CNNs, RNNs, and transformers. Learn to optimize training with GPUs, deploy models on mobile, and utilize libraries like Hugging Face and PyTorch Lightning. It covers deep learning across text, vision, and music, enhancing your AI skills with practical techniques.Order Today at $28.99 $41.99🔍 Model Breakdown: Unveiling the Algorithm of the Week➜ Apple Researchers Propose a Novel AI Algorithm to Optimize a Byte-Level Representation for Automatic Speech Recognition ASR and Compare it with UTF-8 Representation: The blog discusses a new method for enhancing multilingual automatic speech recognition (ASR) using vector quantized auto-encoders. This approach improves byte-level representation accuracy, optimizes resource usage, and reduces error rates, outperforming UTF-8 and character-based methods in multilingual settings.➜ PowerLM-3B and PowerMoE-3B Released by IBM: Revolutionizing Language Models with 3 Billion Parameters and Advanced Power Scheduler for Efficient Large-Scale AI Training. IBM's PowerLM-3B and PowerMoE-3B models showcase advancements in large-scale language model training. Utilizing IBM’s Power scheduler, these models achieve high efficiency and scalability, optimizing learning rates and computational costs for improved performance in NLP tasks.➜ AtScale Open-Sourced Semantic Modeling Language (SML): Transforming Analytics with Industry-Standard Framework for Interoperability, Reusability, and Multidimensional Data Modeling Across Platforms: AtScale has open-sourced its Semantic Modeling Language (SML) to create a standardized, interoperable language for semantic modeling across platforms. Built on YAML, SML supports complex data structures, promotes reusability, and integrates with modern development practices, aiming to enhance collaboration and efficiency in analytics.➜ LG AI Research Open-Sources EXAONEPath: Transforming Histopathology Image Analysis with a 285M Patch-level Pre-Trained Model for Variety of Medical Prediction, Reducing Genetic Testing Time and Costs: LG AI Research's EXAONEPath enhances digital histopathology by addressing Whole Slide Image (WSI) challenges with advanced self-supervised learning and stain normalization. This open-source model improves diagnostic accuracy, reduces genetic testing time, and supports various medical tasks.🚀 Trendspotting: What's Next in Tech Trends➜ How to Trace Memory Allocation in Python? This tutorial demonstrates how to use Python's `tracemalloc` module for tracing memory allocation in memory-intensive operations. It covers setting up a sample dataset, tracking memory usage before and after processing, and comparing snapshots to debug memory issues.➜ Anomaly detection in streaming time series data with online learning using Amazon Managed Service for Apache Flink: This post describes building a real-time anomaly detection system for time series data using AWS services. It outlines how to deploy an end-to-end solution with Amazon Managed Service for Apache Flink, Kafka, and SageMaker, focusing on detecting unusual patterns in streaming data.đŸ› ïž Platform Showdown: Comparing ML Tools & Services➜ 7 Free Cloud IDE for Data Science That You Are Missing Out: To start data science projects quickly, explore these 7 Cloud IDEs: Kaggle Notebooks, Deepnote, Lightning.ai, Datalab by DataCamp, Google Colab, Amazon SageMaker Studio Lab, and DataLore. Each provides pre-built environments and free access to GPUs.➜ Developing End-to-End Data Science Pipelines with Data Ingestion, Processing, and Visualization: The article discusses the iterative nature of data science projects, emphasizing the importance of data ingestion, processing, and visualization. It outlines an end-to-end process involving business understanding, data preparation, model building, and monitoring.➜ Optimizing MLOps for Sustainability: The post outlines optimizing MLOps for sustainability using AWS by improving data preparation, model training, and deployment. Key practices include selecting low-carbon impact regions, using efficient storage, leveraging SageMaker’s tools, and monitoring with AWS services to minimize resource use and emissions.📊 Success Stories: Real-World ML Case Studies➜ GraphRAG auto-tuning provides rapid adaptation to new domains: Microsoft Research's GraphRAG uses large language models to build domain-specific knowledge graphs from text, enabling complex query responses. The tool automates the creation of domain-specific prompts to enhance graph accuracy and streamline knowledge extraction.➜ The “Who Does What” Guide to Enterprise Data Quality: This analysis explores enterprise data quality management, focusing on roles and processes in data detection, triage, resolution, and measurement. It highlights the importance of foundational versus derived data products, and strategies for improving data quality and efficiency.➜ Can AI Agents Do Your Day-to-Day Tasks on Apps? The blog introduces AppWorld, a new benchmarking framework for AI agents that interact with various apps to perform complex tasks. It features a simulated environment, a benchmark of intricate tasks, and a robust evaluation framework to test and improve AI agents’ performance.🌍 ML Newsflash: Latest Industry Buzz & Discoveries➜ Google’s AI detective: The Needle in a Haystack test and how Gemini 1.5 Pro solves it. The blog discusses Google's Gemini 1.5 Pro, an AI model excelling in the "Needle in a Haystack" test. It showcases the model's ability to retrieve specific information from vast datasets across text, video, and audio, outperforming GPT-4 in complex retrieval tasks.➜ Regnology Automates Ticket-to-Code with GenAI on Vertex AI: The blog discusses Regnology's solution to the "Ticket-to-Code Problem," where bug reports are transformed into actionable code. Their Ticket-to-Code Writer tool, enhanced by Google’s Vertex AI and Gemini 1.5 Pro, automates this process, boosting efficiency by 60% and improving accuracy.➜ MedFuzz: Exploring the robustness of LLMs on medical challenge problems. LLMs excel in medical benchmarks but often oversimplify complex real-world scenarios. MedFuzz, inspired by security red-teaming and fuzzing, introduces adversarial challenges to test LLMs against these simplifying assumptions. This approach assesses their true effectiveness in nuanced clinical settings.*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}} @media only screen and (max-width: 100%;} #pad-desktop {display: none !important;} }
Read more
  • 0
  • 0
  • 5814

Merlyn from Packt
18 Sep 2024
6 min read
Save for later

[Save 30%] on Top-Selling Print + eBooks for Data Professionals: Boost Your Knowledge in AI and Data Analytics!

Merlyn from Packt
18 Sep 2024
6 min read
For a limited time, save on the best-selling books that will elevate your skills and knowledge! @media only screen and (max-width: 100%;} #pad-desktop {display: none !important;} }👋 Hello ,✹ Welcome to Packt’s Signature Series: New Titles Just Arrived!📚 We’re excited to present a new collection in our Signature Series, featuring the best-selling titles in the data industry. Packed with insights on Generative AI and multimodal systems, this collection is available for a limited time at 30% off both print and e-book formats. This offer ends Sunday, September 22nd. Don’t miss your chance to upskill and elevate your career. Let’s dive in!➜ Building LLM Powered Applications: This new titleis all about helping engineers and data pros use large language models (LLMs) effectively. It tackles key challenges like embedding LLMs into real-world apps and mastering prompt engineering techniques. You’ll learn to orchestrate LLMs with LangChain and explore various models, making it easier to create intelligent systems that can handle both structured and unstructured data. It’s a great way to boost your skills, whether you’re new to AI or already experienced! Start your free trial for access, renewing at $19.99/month.eBook $27.98 $39.99Print + eBook $34.98 $49.99➜ Python for Algorithmic Trading Cookbook: This bookis your go-to guide for using Python in trading. It helps you tackle key issues like acquiring and visualizing market data, designing and backtesting trading strategies, and deploying them live with APIs. You’ll learn practical techniques to gather data, analyze it, and optimize your strategies using tools like OpenBB and VectorBT. Whether you’re just starting or looking to refine your skills, this book equips you with the know-how to trade smarter with Python! Start your free trial for access, renewing at $19.99/month.eBook $27.98 $39.99Print + eBook $36.99 $49.99➜ Microsoft Power BI Cookbook - Third Edition: The Power BI Cookbook is your essential guide to mastering data analysis and visualization with Power BI. It covers using Microsoft Data Fabric, managing Hybrid tables, and creating effective scorecards. Learn to transform complex data into clear visuals, implement robust models, and enhance reports with real-time data. This updated edition prepares you for future AI innovations, making it a must-have for beginners and seasoned users alike! Start your free trial for access, renewing at $19.99/month.eBook $29.99 $43.99Print + eBook $41.98 $59.99➜ The Definitive Guide to Power Query (M): The Definitive Guide to Power Query (M) focuses on mastering data transformation with Power Query. It covers fundamental and advanced concepts through hands-on examples that address real-world problems. You'll learn the Power Query M language, optimize performance, handle errors, and implement efficient data processes. By the end, you'll have the skills to enhance your data analysis effectively! Start your free trial for access, renewing at $19.99/month.eBook $43.99Print + eBook $37.99 $54.99➜ Mastering PyTorch - Second Edition: This is your essential resource for building advanced neural network models with PyTorch. You'll explore tools like Hugging Face, fastai, and Docker, learning to create models for text, images, and music. With hands-on examples, you'll master training optimization, mobile deployment, and various network types, equipping you to tackle complex AI tasks using the PyTorch ecosystem! Start your free trial for access, renewing at $19.99/month.eBook $28.99 $41.99Print + eBook $40.99 $51.99➜ Unlocking the Secrets of Prompt Engineering: It'syour guide to mastering AI-driven writing with large language models (LLMs). It covers essential techniques and applications, from content creation to chatbots. With practical examples, you'll learn to generate product descriptions and tackle advanced uses like podcast creation. The book emphasizes ethical practices and optimization strategies, preparing you to leverage AI for improved writing, creativity, and productivity! Start your free trial for access, renewing at $19.99/month.eBook $24.99 $35.99Print + eBook $30.99 $44.99➜ ChatGPT for Cybersecurity Cookbook: Your essential guide to using AI in cybersecurity. It helps you automate tasks like penetration testing, risk assessment, and threat detection with ChatGPT. Each recipe provides step-by-step instructions for generating commands, writing code, and creating tools with the OpenAI API and Python. You'll explore innovative strategies and optimize workflows, gaining confidence in AI-driven techniques to excel in the rapidly evolving cybersecurity landscape! Start your free trial for access, renewing at $19.99/month.eBook $27.98 $39.99Print + eBook $34.98 $49.99➜ Mastering NLP from Foundations to LLMs:Your complete guide to Natural Language Processing (NLP) with Python. It covers the mathematical foundations of machine learning and essential topics like linear algebra and statistics. You'll learn to preprocess text, classify it, and implement advanced techniques, including large language models (LLMs). With practical Python code samples and insights into future trends, you'll gain the skills to tackle real-world NLP challenges confidently and effectively design ML-NLP systems! Start your free trial for access, renewing at $19.99/month.eBook $29.99 $42.99Print + eBook $46.99 $52.99➜ Learn Microsoft Fabric: This title is your essential guide to using Microsoft Fabric for data integration and analytics. It explores key features with real-world examples, helping you build solutions for lakehouses, data warehouses, and real-time analytics. You'll learn to effectively monitor your Fabric platform and cover workloads like Data Factory and Power BI. By the end, you'll be equipped to unlock AI-driven insights and navigate the analytics landscape confidently! Start your free trial for access, renewing at $19.99/month.eBook $24.99 $35.99Print + eBook $35.98 $44.99➜ Building Data-Driven Applications with LlamaIndex: This book is your comprehensive guide to leveraging Generative AI and large language models (LLMs). It addresses challenges like memory constraints and data gaps while teaching you to build interactive applications with LlamaIndex. You'll learn to ingest and index data, create optimized indexes, and query your knowledge base through hands-on projects. By the end, you'll be equipped to troubleshoot LLM issues and confidently deploy your AI-driven applications! Start your free trial for access, renewing at $19.99/month.eBook $24.99 $35.99Print + eBook $30.99 $44.99➜ OpenAI API Cookbook: This new title is all about using the OpenAI API to create smart applications. It helps engineers and data pros understand the basics, set up their API, and build tailored tools like chatbots and virtual assistants. You’ll learn practical recipes to enhance user experience and integrate AI into your workflows, making your projects more efficient and innovative! Start your free trial for access, renewing at $19.99/month.eBook $21.99 $31.99Print + eBook $27.98 $39.99Loved Those Titles? Check These Out!➜ Data Governance Handbook➜ Generative AI for Cloud Solutions➜ Data-Centric Machine Learning with Python➜ Modern Python Cookbook - Third EditionWe’ve got more great things coming your way—see you soon!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}} @media only screen and (max-width: 100%;} #pad-desktop {display: none !important;} }
Read more
  • 0
  • 0
  • 5199

Merlyn from Packt
19 Sep 2024
10 min read
Save for later

Google AI’s DataGemma, PyTorch Automatic Mixed Precision Library, Conversational Analytics in Looker, Mistral-Small-Instruct-2409, Comet’s Opik, OpenAI o1 System Card

Merlyn from Packt
19 Sep 2024
10 min read
BigQuery’s Contribution Model, Apache Airflow ETL on Google Cloud, Graviton4 EC2 Instances @media only screen and (max-width: 100%;} #pad-desktop {display: none !important;} }Join Roman Lavrik from Deloitte Snyk hosted DevSecCon 2024Snyk is thrilled to announce DevSecCon 2024, Developing AI Trust Oct 8-9, a FREE virtual summit designed for DevOps, developer and security pros of all levels. Join Roman Lavrik from Deloitte, among many others, and learn some presciptive DevSecOps methods for AI-powered development.Save your spotSponsoredWelcome to DataPro #112—Your Weekly Fix of Data Science & ML Magic! 🌟In the fast-moving world of AI and ML, staying ahead means leveraging smart strategies for bold decisions. This week, we’re bringing you expert insights from our new Packt Signature Series. From real-time data mastery to AI modeling techniques, we’ve got everything you need to level up your data game!Get ready to elevate your model accuracy, supercharge performance, and cut costs with the latest in scalable solutions. Dive into this week’s must-read articles, tips, and practical techniques.📚 Must-Reads for Data Pros✩ LLM-Powered Apps: Build smarter AI tools✩ Python for Trading: Algorithmic insights✩ Power BI Cookbook: Master data visualization✩ The Prompt Engineering Playbook: Unlock AI secrets✩ Mastering PyTorch: Deep learning unleashed🔍 Algorithm Spotlight: Dive Deep into the Tech✩ Automating Metrics with Amazon Prometheus: Simplify data tracking on EKS✩ Graviton4 EC2 Instances: Memory-optimized power for your AI workloads✩ OpenAI Safety Practices: An update on securing AI✩ Mistral AI Release: Open-source models with unmatched flexibility🚀 Trendspotting: The Future of AI✩ Eureka AI Progress: Understand and evaluate AI advancements✩ OpenAI o1 System Card: A glance into AI innovations✩ Conversational Analytics Preview: What’s new in Looker?✩ Comet’s Opik: Streamlining LLM evaluation and prompt trackingđŸ› ïž Tool Showdown: Which ML Platform Reigns Supreme?✩ BigQuery’s Contribution Model: Fresh insights for your data✩ Running Airflow on Google Cloud: Three easy approaches✩ Python Tricks: Merge dictionaries like a pro✩ Google AI’s DataGemma: A Set of Open Models that Utilize Data Commons📊 Case Studies: ML Success Stories✩ Handling Large Text with Longformer: A Hugging Face deep dive✩ Confluent & Vertex AI: Integrating LLMs for big wins✩ What Makes a Data Business Thrive? Lessons from the top🌍 ML Buzz: Industry News & Discoveries✩ Cracking PyTorch’s Mixed Precision Library: What you need to know✩ MLflow, Azure, Docker: Managing models with ease✩ Self-Learning Models: Teaching AI to improve autonomouslyGet ready for a week of data-driven breakthroughs!Take our weekly survey and get a free PDF copy of our best-selling book,"Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share Your Insights and Shine! 🌟💬Cheers,Merlyn Shelley,Editor-in-Chief, Packt.Sponsored📚 Packt Signature Series: Must-Reads & Author InsightsWe’re excited to present a new collection in our Signature Series, featuring the best-selling titles in the data industry. Packed with insights on Generative AI and multimodal systems, this collection is available for a limited time at 30% off both print and e-book formats. This offer ends Sunday, September 22nd. Don’t miss your chance to upskill and elevate your career. Let’s dive in!➜ Building LLM Powered Applications: This new titleis all about helping engineers and data pros use large language models (LLMs) effectively. It tackles key challenges like embedding LLMs into real-world apps and mastering prompt engineering techniques. You’ll learn to orchestrate LLMs with LangChain and explore various models, making it easier to create intelligent systems that can handle both structured and unstructured data. It’s a great way to boost your skills, whether you’re new to AI or already experienced! Start your free trial for access, renewing at $19.99/month.eBook $27.98 $39.99Print + eBook $34.98 $49.99➜ Python for Algorithmic Trading Cookbook: This bookis your go-to guide for using Python in trading. It helps you tackle key issues like acquiring and visualizing market data, designing and backtesting trading strategies, and deploying them live with APIs. You’ll learn practical techniques to gather data, analyze it, and optimize your strategies using tools like OpenBB and VectorBT. Whether you’re just starting or looking to refine your skills, this book equips you with the know-how to trade smarter with Python! Start your free trial for access, renewing at $19.99/month.eBook $27.98 $39.99Print + eBook $36.99 $49.99➜ Microsoft Power BI Cookbook - Third Edition: The Power BI Cookbook is your essential guide to mastering data analysis and visualization with Power BI. It covers using Microsoft Data Fabric, managing Hybrid tables, and creating effective scorecards. Learn to transform complex data into clear visuals, implement robust models, and enhance reports with real-time data. This updated edition prepares you for future AI innovations, making it a must-have for beginners and seasoned users alike! Start your free trial for access, renewing at $19.99/month.eBook $29.99 $43.99Print + eBook $41.98 $59.99➜ The Definitive Guide to Power Query (M): The Definitive Guide to Power Query (M) focuses on mastering data transformation with Power Query. It covers fundamental and advanced concepts through hands-on examples that address real-world problems. You'll learn the Power Query M language, optimize performance, handle errors, and implement efficient data processes. By the end, you'll have the skills to enhance your data analysis effectively! Start your free trial for access, renewing at $19.99/month.eBook $43.99Print + eBook $37.99 $54.99🔍 Model Breakdown: Unveiling the Algorithm of the Week➜ Automating metrics collection on Amazon EKS with Amazon Managed Service for Prometheus managed scrapers: This blog discusses how Amazon Managed Service for Prometheus simplifies monitoring containerized applications in Amazon EKS by introducing a fully-managed, agentless scraper for Prometheus metrics, reducing operational overhead and enhancing efficiency through Terraform and AWS CloudFormation automation.➜ Now available: Graviton4-powered memory-optimized Amazon EC2 X8g instances. This post introduces Graviton-4-powered X8g instances, offering high memory, enhanced performance, scalability, and security for applications like databases and electronic design automation, emphasizing their efficiency, flexibility, and improved price-performance over previous instances.➜ An update on OpenAI safety & security practices: This post introduces OpenAI's Safety and Security Committee, outlining five key recommendations to enhance governance, security, transparency, collaboration, and safety frameworks for AI model development and deployment, ensuring responsible and secure advancements in AI technology.➜ Mistral AI Released Mistral-Small-Instruct-2409: A Game-Changing Open-Source Language Model Empowering Versatile AI Applications with Unmatched Efficiency and Accessibility. This article introduces Mistral AI's release of Mistral-Small-Instruct-2409, a powerful open-source large language model designed to enhance AI performance, promote accessibility, and support various natural language processing tasks with an emphasis on transparency, collaboration, and ethical AI development.🚀 Trendspotting: What's Next in Tech Trends➜ Eureka: Evaluating and understanding progress in AI. This post introduces the EUREKA framework for evaluating AI models, emphasizing the need for in-depth measurement beyond standard benchmarks. It aims to uncover strengths, weaknesses, and real-world capabilities of state-of-the-art models through transparent and reproducible evaluations.➜ OpenAI o1 System Card: This report outlines safety evaluations conducted before releasing OpenAI o1 models, addressing risks like bias, hallucinations, and disallowed content. It highlights mitigations, advanced reasoning capabilities, and overall safety ratings under OpenAI's Preparedness Framework.➜ Conversational Analytics in Looker is now in preview: This post introduces Looker's Conversational Analytics, powered by AI and Looker’s semantic model, enabling users to ask data questions in natural language. It simplifies business intelligence, enhances accessibility, and promotes data-driven decision-making across organizations.➜ Comet Launches Opik: A Comprehensive Open-Source Tool for End-to-End LLM Evaluation, Prompt Tracking, and Pre-Deployment Testing with Seamless Integration. This article introduces Opik, an open-source platform by Comet for enhancing observability and evaluation of large language models (LLMs). Opik helps developers and data scientists monitor, test, and track LLM applications, improving performance reliability and addressing issues like hallucinations.đŸ› ïž Platform Showdown: Comparing ML Tools & Services➜ Introducing a new contribution analysis model in BigQuery: This post introduces contribution analysis in BigQuery ML, which helps organizations identify key data drivers behind trends and fluctuations, enabling faster, data-driven decisions by analyzing test and control datasets, and finding statistically significant contributors at scale.➜ Three different ways to run Apache Airflow ETL on Google Cloud: This article explores three ways to run Apache Airflow on Google Cloud, comparing Compute Engine, managed solutions, and infrastructure setups. It highlights the pros and cons of each, providing Terraform code for implementation.➜3 Simple Ways to Merge Python Dictionaries: This blog explains three common methods to merge dictionaries in Python: using the `update()` method, dictionary unpacking (`{**dict1, **dict2}`), and the union operator (`|`), providing code examples for each approach.➜ Google AI Introduces DataGemma: A Set of Open Models that Utilize Data Commons through Retrieval Interleaved Generation (RIG) and Retrieval Augmented Generation (RAG). Google's DataGemma addresses hallucinations in large language models (LLMs) by grounding them in real-world statistical data through Google’s Data Commons. It introduces two advanced models, RAG-27B-IT and RIG-27B-IT, enhancing precision for tasks requiring deep analysis and real-time fact-checking.📊 Success Stories: Real-World ML Case Studies➜ How to Handle Large Text Inputs with Longformer and Hugging Face Transformers? This post is a tutorial on using Longformer with Hugging Face Transformers for processing long text inputs in NLP tasks. It covers installing necessary packages, loading datasets, fine-tuning models, and evaluating results for tasks like review classification.➜ Integrating Confluent and Vertex AI with LLMs: This blog explains how integrating large language models (LLMs) with Confluent and Vertex AI automates SQL query generation, streamlining real-time data analytics. It enhances data exploration, report generation, pipeline optimization, and anomaly detection, addressing challenges like complex queries and real-time decision-making.➜ What Makes a Great Data Business? This post discusses how to identify and evaluate data businesses, highlighting their high margins and value potential. It covers key evaluation criteria: data sources, uses, nice-to-haves, and business models, providing a framework for private equity investors to spot valuable data businesses.🌍 ML Newsflash: Latest Industry Buzz & Discoveries➜ The Mystery Behind the PyTorch Automatic Mixed Precision Library: This article explains how to accelerate deep learning model training using Nvidia's automatic mixed precision (AMP) technique. It introduces Nvidia's Tensor cores, reviews the "Mixed Precision Training" paper, and demonstrates a 2X training speed-up for ResNet50 on FashionMNIST with minimal code changes.➜ Model Management with MLflow, Azure, and Docker: This article explains how to deploy MLflow, a tool for managing machine learning workflows, in a Docker container on Azure for scalability and collaboration. It covers MLflow's key components, focusing on MLflow Tracking, and provides a hands-on guide for setting up the system with Azure SQL Database and Blob Storage.➜ Teaching Your Model to Learn from Itself: This article explains pseudo-labeling, a semi-supervised learning technique that uses confident predictions from a model to label unlabeled data. A case study on the MNIST dataset demonstrates how pseudo-labeling boosted accuracy from 90% to 95% by iteratively adding confident predictions to the training set.We’ve got more great things coming your way—see you soon!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}} @media only screen and (max-width: 100%;} #pad-desktop {display: none !important;} }
Read more
  • 0
  • 0
  • 4876

Merlyn from Packt
30 Aug 2024
13 min read
Save for later

❇ NVIDIA NIM on SageMaker, Weaviate's StructuredRAG, Vectorlite v0.2.0, Imagen 3 on Vertex AI, Cerebras DocChat, Zyphra's Zamba2-mini, AWS DeepRacer

Merlyn from Packt
30 Aug 2024
13 min read
DeepSeek-AI’s Fire-Flyer AI-HPC, Microsoft’s Brain-Inspired AI Design, Fairness in Graph Filtering👋 Hello ,Happy Friday! 🌟Welcome to DataPro #109—Your Weekly Data Science & ML Digest! 🚀This week’s edition is packed with exciting updates! Discover Table-Augmented Generation (TAG) for smarter querying, Vectorlite v0.2.0 for speedy SQL-powered search, Zyphra's Zamba2-mini, and Weaviate's StructuredRAG for reliable AI outputs. Plus, we’ve curated top resources to supercharge your ML models with enhanced accuracy and efficiency!⚡ Tech Tidbits: Fresh Innovations and Toolsâ–Ș AWS: Speed up AI inference with NVIDIA NIM on SageMaker and integrate Amazon Q with GitHub.â–Ș Google ML: Explore multimodal search with BigQuery and get the lowdown on Imagen 3 on Vertex AI.â–Ș Microsoft Research: Dive into brain-inspired AI design for next-gen tech.📚 Hot Reads from Packt Libraryâ–Ș Data Science Fundamentals Pocket Primer: Your essential guide to data science concepts.â–Ș Mastering Looker and LookML: Create insightful views, dashboards, and databases.â–Ș AI and Expert Systems: Techniques and applications for solving real-world problems.🔍 From Bits to BERT: LLMs & GPTs Spotlightâ–Ș TAG: Revolutionize database querying with a unified approach.â–Ș Vectorlite v0.2.0: Get SQL-powered vector search with speed.â–Ș StructuredRAG by Weaviate: Benchmark for reliable JSON outputs in AI.â–Ș Cerebras DocChat: Fast, Llama 3-based GPT-4-level QA.â–Ș Extension|OS: Open-source tool for on-demand AI access.â–Ș AI21 Labs' Jamba 1.5: Quick, high-quality multilingual AI.â–Ș LayerPano3D: AI framework for generating 3D scenes from text.â–Ș Zyphra's Zamba2-mini: High-performance small language model.â–Ș Fairness in Graph Filtering: Framework for better AI fairness.â–Ș iAsk AI: Outperforming ChatGPT on MMLU Pro Test.â–Ș DeepSeek-AI’s Fire-Flyer AI-HPC: Cost-effective deep learning solution.✹ On the Radar: What’s New & Noteworthyâ–Ș New LLM Agents: Exploring the latest architecture.â–Ș Pandas Power: Advanced plotting techniques.â–Ș AWS DeepRacer: Bridging the Sim2Real gap.â–Ș MarianMT Translation: Easy language translation with Hugging Face Transformers.â–Ș Building Transformers: A guide to training from scratch.â–Ș ML Optimization: Top tips for boosting algorithm performance.Enjoy your weekend and stay ahead in the world of data science!DataPro Newsletter is not just a publication; it’s a complete toolkit for anyone serious about mastering the ever-changing landscape of data and AI. Grab your copyand start transforming your data expertise today!Calling Data & ML Enthusiasts!Want to share your insights and build your online reputation? Contribute to our new Packt DataPro column! Discuss tools, share experiences, or ask questions. Gain recognition among 128,000+ data professionals and boost your CV. Simply reply with your Google Docs link or use our feedback form. Whether you’re looking for visibility or a discreet approach, we’re here to support you.Share your content today and engage with our vibrant community! We’re excited to hear from you!Take our weekly survey and get a free PDF copy of our best-selling book,"Interactive Data Visualization with Python - Second Edition."We appreciate your input and hope you enjoy the book!Share Your Insights and Shine! 💬📚Expert Insights from Packt CommunityDid you know? “Books are the quietest, most constant friends, holding the world’s treasured wisdom. They offer gentle guidance and timeless lessons, passing their rich inheritance from one generation to the next.”We’re thrilled to bring you this week’s must-have new releases, straight from the experts to your bookshelf! Whether you're eager to enhance your skills or explore new horizons, now is the perfect moment to add these invaluable resources to your collection.For a limited time, enjoy 30% off all eBooks at Packtpub.com. These books are thoughtfully crafted by industry insiders with hands-on experience, offering unique insights you won’t find anywhere else.Don’t let these Packt-exclusive deals slip away—seize the opportunity to learn from the best at an unbeatable price!Order Today at $41.98 $59.99Data Science Fundamentals Pocket Primer: An Essential Guide to Data Science Concepts and TechniquesBy Mercury Learning and Information, Oswald CampesatoImagine having a go-to guide that gently walks you through the essentials of data science, making complex concepts feel accessible. This book does just that. With a blend of practical exercises and real-world examples, it simplifies the vast world of data science. Here’s what you’ll love:- A clear introduction to data science fundamentals.- Hands-on learning with practical examples.- Mastery of tools like Python, NumPy, Pandas, and R.- Techniques for data visualization to bring your data to life.Whether you're just starting or looking to sharpen your skills, this book is your companion on the journey to mastering data science.Get your copy now for $41.98 (originally $59.99).Order TodayMastering Looker and LookML - Complete Looker Guide for Developers: Master Looker and LookML to create views, dashboards, and databases with this guide [Video]By HHN Automate Book Inc.Embark on a journey to unlock the full potential of Looker with our all-encompassing course. Whether you’re new to Looker or looking to deepen your skills, this course guides you step-by-step through everything you need to know.Here’s what you can expect:- Hands-on tutorials for setting up your environment and connecting data.- In-depth exploration of LookML fields, parameters, and joins.- Advanced techniques for creating and managing impactful dashboards.By the end, you’ll have the confidence to create dynamic, data-driven insights that can drive meaningful decisions in your organization.Get the full video course now for $104.99 (MP4 download available).Order Today at $34.98 $49.99Artificial Intelligence and Expert Systems: Techniques and Applications for Problem SolvingBy Mercury Learning and Information ,I. Gupta ,G. NagpalDive into the world of AI with a guide that makes complex concepts approachable and practical. This book is your gateway to mastering AI, offering:- In-depth coverage of AI and expert systems.- Clear explanations paired with real-world applications.- Exploration of advanced topics like neural networks and fuzzy logic.From understanding the basics of AI to applying expert systems and neural networks, this book equips you with the tools to solve real-world problems. Perfect for anyone eager to enhance their knowledge of intelligent systems.Grab your copy now for $34.98 (originally $49.99).🔰 Data Science Tool Kit➀ NicolasHug/Surprise:Python scikit for building recommender systems with explicit rating data, emphasizing experiment control, dataset handling, and diverse prediction algorithms.➀ gorse-io/gorse:Open-source recommendation system in Go, designed for universal integration into online services, automating model training based on user interaction data.➀ recommenders-team/recommenders:Recommenders, a Linux Foundation project, offers Jupyter notebooks for building classic and cutting-edge recommendation systems, covering data prep, modeling, evaluation, optimization, and production deployment on Azure.➀ alibaba/Alink:Alink, developed by Alibaba's PAI team, integrates Flink for ML algorithms. PyAlink supports various Flink versions, maintaining compatibility up to Flink 1.13.➀ RUCAIBox/RecBole:RecBole, built on Python and PyTorch, facilitates research with 91 recommendation algorithms across general, sequential, context-aware, and knowledge-based categories.Access 100+ data tools in this specially curated blog, covering everything from data analytics to business intelligence—all in one place. Check out"Top 100+ Essential Data Science Tools & Repos: Streamline Your Workflow Today!"on PacktPub.com.⚡Tech Tidbits: Stay Wired to the Latest Industry Buzz!AWS ML Made Easy➀ Accelerate Generative AI Inference with NVIDIA NIM Microservices on Amazon SageMaker: The blog details NVIDIA's new NIM Inference Microservices integration with Amazon SageMaker, enabling fast, cost-effective deployment of large language models. It covers the use of prebuilt containers for efficient AI inferencing and provides a guide for setup and evaluation.➀ Connect the Amazon Q Business generative AI coding companion to your GitHub repositories with Amazon Q GitHub (Cloud) connector: This blog explains how incorporating generative AI, like Amazon Q Developer, can boost development productivity by up to 30% and streamline developer tasks. It details integrating Amazon Q Business with GitHub (Cloud) for natural language queries to manage repositories and enhance enterprise operations.Mastering ML with Google➀ Multimodel search using NLP, BigQuery and embeddings: This blog introduces a new era in search with multimodal embeddings, enabling text-based queries for images and videos. It showcases a demo for cross-modal search using Google Cloud Storage and BigQuery, allowing users to search for visual content through text queries.➀ A developer's guide to Imagen 3 on Vertex AI: The blog highlights user feedback on Imagen 3, emphasizing its need for high-quality, versatile image generation. It discusses improvements in artistic style, prompt adherence, and safety features like watermarking. Code examples illustrate creating photorealistic images and rendering text with the model.Microsoft Research Insights➀ Innovations in AI: Brain-inspired design for more capable and sustainable technology. Microsoft Research Asia, in collaboration with multiple institutions, is developing brain-inspired AI models to improve efficiency and sustainability. Key projects include CircuitNet for neural patterns, enhanced spiking neural networks (SNNs) for time-series prediction, and integrating central pattern generators for better sequence processing.🔍From Bits to BERT: Keeping Up with LLMs & GPTs➀ Table-Augmented Generation (TAG): A Unified Method for Improved Database Querying. Researchers from UC Berkeley and Stanford propose Table-Augmented Generation (TAG) to improve natural language queries over databases. TAG enhances query handling by combining query synthesis, execution, and answer generation, outperforming existing methods like Text2SQL and RAG in accuracy and complexity.➀ Vectorlite v0.2.0: Fast, SQL-Powered Vector Search with SQLite Driver. Vectorlite v0.2.0 enhances performance by using Google’s highway library for vector distance, addressing hnswlib’s limitations on SIMD instruction support and vector normalization. The update improves speed significantly, especially on x64 platforms with AVX2, and is now SIMD-accelerated on ARM.➀ StructuredRAG by Weaviate: Benchmark for Reliable JSON Output in AI. The StructuredRAG benchmark evaluates LLMs' ability to generate structured outputs like JSON. Testing Gemini 1.5 Pro and Llama 3 8B-instruct with various prompting strategies revealed an 82.55% success rate on average, with performance varying significantly by task and model.➀ Cerebras DocChat: Llama 3-Based GPT-4-Level QA in Hours. Cerebras has released two models for document-based Q&A: Llama3-DocChat and Dragon-DocChat, trained quickly using Cerebras Systems. Llama3-DocChat builds on Llama 3, while Dragon-DocChat improves on Dragon+ with enhanced recall. Both models and their training data are open-source.➀ Extension|OS: Open-Source Browser Tool for On-Demand AI Access. Extension|OS is a browser extension that integrates AI tools directly into web pages, allowing users to perform tasks like grammar checks and content edits without switching tabs. It features prompt customization, secure API key storage, and enhanced functionality with a Mixture of Agents.➀ AI21 Labs' Jamba 1.5 Models: Speedy, Quality, Multilingual AI. AI21's Jamba 1.5 Open Model Family features the Jamba 1.5 Mini and Large models, built on the SSM-Transformer architecture. They offer the longest context window, exceptional speed, and high quality. Jamba 1.5 models outperform competitors and support extensive enterprise applications.➀ LayerPano3D: AI Framework for Consistent 3D Scene Generation from Text. LayerPano3D introduces a novel framework for generating full-view, explorable panoramic 3D scenes from a single text prompt. By decomposing 2D panoramas into layered 3D representations, it achieves high-quality, consistent views and immersive exploration, surpassing existing methods.➀ Zyphra's Zamba2-mini: Efficient, High-Performance Small Language Model. Zamba2-1.2B improves hybrid SSM-transformer models by adding rotary embeddings and LoRA projectors for depth-specialization, enhancing performance. Developed to optimize model efficiency and accuracy, it’s applicable in real-world scenarios like advanced NLP tasks and code generation.➀ Fairness in Graph Filtering: Framework for Theory and Mitigation Techniques. The paper addresses fairness in GNN-based recommendation systems, which often overlook consumer fairness. It evaluates a new method for adjusting fairness via fair graph augmentation. This approach consistently improves fairness across various GNN models and datasets, advancing recommendation system equity.➀ iAsk Ai Outperforms ChatGPT and Others on MMLU Pro Test: The iAsk Pro model achieved a record 85.85% accuracy on the MMLU-Pro benchmark, surpassing all current LLMs, including GPT-4o, by over 13 percentage points. This dataset, with 12,000 complex questions, tests multi-task language comprehension rigorously. iAsk Pro's performance highlights its advanced reasoning and understanding capabilities, setting a new standard in AI evaluation.➀ Lite Oute 2 Mamba2Attn 250M: 10X More Efficient AI. The Lite Oute 2 Mamba2Attn 250M model, using the new Mamba2 architecture with attention layers, boasts 250 million parameters and achieves high benchmark scores. It was developed for improved efficiency and performance in various tasks, showing enhanced results in multiple evaluations compared to previous models.➀ DeepSeek-AI Launches Fire-Flyer AI-HPC: Cost-Effective Deep Learning Solution. The Fire-Flyer AI-HPC architecture addresses high costs and energy demands in Deep Learning by integrating hardware-software design. With 10,000 PCIe A100 GPUs, it cuts costs by 50% and reduces energy use by 40%, improving scalability and performance.✹On the Radar: Catch Up on What's Fresh➀ Navigating the New Types of LLM Agents and Architectures: The post explores the evolution of AI agents from early ReAct models to the second generation of more structured, efficient agents. It introduces tools and frameworks for building these agents and highlights advancements in design and performance. Key insights include improvements in routing and state management.➀ The Power of Pandas Plots: Backends. The article highlights how Pandas can leverage various visualization backends, such as Matplotlib, Plotly, and Hvplot, to enhance data visualization without extensive retraining. It shows how easy it is to switch between these backends for interactive and efficient plotting, emphasizing Hvplot's ease of use and integration.➀ AWS DeepRacer : A Practical Guide to Reducing The Sim2Real Gap. The article focuses on training the AWS DeepRacer to safely navigate a track. It emphasizes creating a "safe" model that prioritizes staying on the track over speed. Key aspects include setting up the track, designing reward functions, and using a discrete action space. It details iterative training, starting with slower models and gradually increasing speed, to enhance both safety and performance. The final reward function balances staying on the track and adjusting speed for turns, with iterative improvements for increased reliability.➀ How to Translate Languages with MarianMT and Hugging Face Transformers? The article explains how to use MarianMT with Hugging Face Transformers for language translation. It covers installation, model selection, loading, tokenization, and translating text. The guide provides steps for translating to multiple languages and highlights MarianMT’s ease of use and effectiveness.➀ How to Build and Train a Transformer Model from Scratch with Hugging Face Transformers? The Hugging Face Transformers library enables both the use of pre-trained models and the creation of custom transformer models from scratch. This tutorial guides you through setting up, tokenizing data, configuring, and training a transformer for sentiment classification, emphasizing the need for high-performance computing resources.➀ 5 Tips for Optimizing Machine Learning Algorithms: This blog provides key tips for optimizing machine learning algorithms, focusing on data preparation, hyperparameter tuning, cross-validation, regularization, and ensemble methods. It aims to improve the accuracy, efficiency, and robustness of ML models for real-world applications.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{line-height:0;font-size:75%} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
  • 4705

Merlyn from Packt
02 Oct 2025
14 min read
Save for later

Join Snyk at DevSecCon25 - Securing the Shift to AI Native

Merlyn from Packt
02 Oct 2025
14 min read
OpenAI’s Sora 2, Anthropic’s Claude Sonnet 4.5, Google Research’s ReasoningBank, oLLM Snyk body { margin: 0; padding: 0; -webkit-text-size-adjust: 100% !important; -ms-text-size-adjust: 100% !important; -webkit-font-smoothing: antialiased !important; } img { border: 0 !important; outline: none !important; } p { Margin: 0px !important; Padding: 0px !important; } table { border-collapse: collapse; mso-table-lspace: 0px; mso-table-rspace: 0px; } td, a, span { border-collapse: collapse; mso-line-height-rule: exactly; } .buttontext { text-transform: inherit } .ExternalClass * { line-height: 100%; } .em_defaultlink a { color: inherit; text-decoration: none; } .em_footer a { color: #979797; text-decoration: underline; } .em_purple a { color: #8a2ac2 !important; text-decoration: underline !important; } .em_g_img+div { display: none; } a[x-apple-data-detectors], u+.em_body a, #MessageViewBody a { color: inherit; text-decoration: none; font-size: inherit !important; font-family: inherit !important; font-weight: inherit !important; line-height: inherit; } @media only screen and (max-width: 100%; } .em_wrapper { width: 100%; } .em_hide { display: none !important; } .em_full_img img { width: 100%; height: auto !important; max-width: 100%; } .em_center { text-align: center !important; } .em_side15 { width: 100%; } .em_ptop { padding-top: 20px !important; } .em_pbottom { padding-bottom: 20px !important; } .em_h20 { height: 20px !important; font-size: 1px !important; line-height: 1px !important; } .em_hauto { height: auto !important; } u+.em_body .em_full_wrap { width: 100%; width: 100%; } .em_pad { padding: 20px 15px !important; } .em_ptb { padding: 20px 0px 20px !important; } .em_pad1 { padding: 20px 15px 10px !important; } .em_pad2 { padding: 10px 15px 20px !important; } .em_ptb1 { padding: 30px 0px 20px !important; } .em_plrb { padding: 0px 15px 20px !important; } .em_h10 { height: 10px !important; line-height: 0px !important; font-size: 0px !important; } .em_wrap_50 { width: 100%; } } @media screen and (max-width: 100%; height: auto !important; } .em_img_1 img { width: 100%; height: auto !important; } .em_img_2 { width: 100%; height: auto !important; } .em_img_2 img { width: 100%; height: auto !important; } .em_img_3 { width: 100%; height: auto !important; } .em_img_3 img { width: 100%; height: auto !important; } .em_img_4 { width: 100%; height: auto !important; } .em_img_4 img { width: 100%; height: auto !important; } } The future of secure AI-driven development is here, and DevSecCon25 is leading the conversation! Join us on October 22, 2025 for this one-day event to hear from leading experts in AI and security from Qodo, Ragie.ai, Casco, Arcade.dev, and more! The full agenda includes: Mainstage - Hear inspiring keynotes from leaders in AI and cybersecurity. Expect forward-looking insights, industry thought leadership, and a vision of what’s next in the world of secure AI. AI demos track - Bring your laptop and join us for interactive, hands-on demos under the theme “Build and Secure with AI.” You'll leave with skills you can immediately apply. AI security track - Cutting-edge talks exploring the evolving security challenges of the AI era. Discover how to safeguard AI-driven applications, gain visibility into models, and secure agents across the SDLC. Snyk innovation track - Experience the latest advancements from Snyk in this dynamic track featuring live product demos, major announcements, and customer success stories. Don't miss this opportunity to gain the knowledge and strategies needed to embrace the AI revolution securely. Save your Spot Manage Preferences | Book a Demo| Contact Us| Community SponsoredSubscribe|Submit a tip|Advertise with UsWelcome to DataPro Expert Insights #152We’reexcited to bring you another packed edition full of deep dives, practical tutorials, andcutting-edgeupdates in Data & AI. This week,we’rethrilled to welcomeNishant Arora, Solutions Architect at AWS, to our newsletter portfolio, who will be sharing deep-dive insights on how AI is reshaping industries. In this issue, Nishant unpacks how trustworthy machine learning is being built into automotive and manufacturing systems, where safety, explainability, and regulatory readiness are mission critical.Butthat’sjust the beginning. This issue also rounds up some of the most important developments across the AI ecosystem:đŸ”čOpenAI’s Sora 2brings the next generation of video and audio generation, blending realism, controllability, and creativity.đŸ”čAnthropic’s Claude Sonnet 4.5sets new benchmarks in coding, reasoning, and agentic capabilities.đŸ”čGoogle Research’s ReasoningBankshows how LLM agents can self-evolve by learning from both successes and failures.đŸ”čoLLMdemonstrateshow 100K-context LLMs can now run on consumer GPUs with SSD offload.đŸ”čTutorials and explainers,from building advancedAgentic RAG pipelinesand experimenting withLangGraph workflows, to demystifying theGini Coefficient, preparing video data withvid-prepper, and writing your firstTriton GPU kernel.đŸ”čAnd finally,a timelylook at howensemble-based fact-checking systemscan catch and neutralize repeating false claims before they spread.This edition is designed to give you bothstrategic insightsinto AI’s role in industry transformation andhands-on knowledgeto stay ahead in your technical work.Let’sdive in.Cheers,Merlyn ShelleyGrowth Lead, PacktTrustworthy Machine Learning in Automotive: Safety, Explainability, and Regulation Readiness– Written by Nishant Arora, Solutions Architect at AWSArtificial intelligence (AI) and machine learning (ML) are redefining the automotive industry. Cars are no longer just mechanical systems; they are intelligent, adaptive, and connected machines. Advanced driver-assistance systems (ADAS), predictive maintenance tools, and self-driving algorithms promise safer and more efficient transportation. Yet, the integration of ML also raises pressing concerns:can we guarantee these systems behave safely, explain their choices, andcomply withstrict automotive standards?Unlike recommendation systems or digital assistants, automotive MLoperatesinlife-critical environments. A single wrong decisionidentifyinga pedestrian, miscalculating braking distance, orfailing to detectsensor faults could have irreversible consequences. This is whytrustworthinessis not just a desirable property, but apreconditionfor adoption at scale.Safety as the Core of TrustIn safety-critical applications, evaluating ML performance goes beyond accuracy. What matters is whether the system preserves safe operation under all circumstances. A useful framing is:P (Safe|Model Decision)This probability expresses the likelihood that, given a model’s action, the outcome is safe. Accuracy alone does not guarantee that the rare but dangerous cases are adequately addressed.Equally important is theability to measure uncertainty. For example, an object recognition system in an autonomous car must know when it is unsure if a shadow is a pedestrian or just road texture. This can be modeled as predictive variance:Var(y∣x,Ξ)whereyis the outcome for inputxunder model parameters Ξ. Systems that quantify uncertainty allow safer fallback strategies such as driver takeover or conservative control.Safety can also be built directly into model training. A combinedobjectivefunction might looklike: L=Laccuracy+λ⋅LsafetywhereLaccuracyreflects predictive performance andLsafetypenalizes unsafe decisions, weighted by factorλ. In this way, the model learns not only to be correct, but also to respect predefined safety boundaries.Finally,confidence calibrationis vital. Regulators often require that predicted probabilities align with actual outcomes, ensuring that an ML model’s confidence is trustworthy:E[∣y^−y∣]≀ΔwhereΔrepresentsthe maximum allowable deviation. Poor calibration can create dangerous overconfidence even when classification accuracy is high.Explainability: Building Human TrustEven a safe system will not be widely adopted if engineers, regulators, and customers cannot understand how it works. This is whereexplainable ML (XAI)becomes indispensable.Some prominent methods include:>> Feature attribution tools(e.g., SHAP, LIME) that show which sensor inputs or environmental factors most influenced a model’s decision.>> Surrogate models, such as simple decision trees approximating a deep neural network, whichmake the decision boundary more interpretable.>> Rule-based explanations, translating complex outputs into understandable logic:“if road is slippery and braking distance exceeds threshold, reduce speed.”Such techniques allow developers to debug failures, give regulators evidence for certification, and help buildpublic confidencein ML-driven cars.Regulation and Safety StandardsTraditional automotive safety is governed by standards likeISO 26262, which defines processes and Automotive Safety Integrity Levels (ASILs). These were designed for deterministic, rule-based software. ML, by contrast, is probabilistic and data-driven, creating new challenges for compliance.To bridge this gap, companies are adoptingverification and validation (V&V) frameworkstailored for ML. These include large-scale simulation testing, corner-case scenario generation, and monitoring model drift once systems are deployed. The aim is not just to test for accuracy, but to produceaudittrailsand evidence of robustness that regulators can certify.Looking ahead, standardswilllikely evolveto explicitly account for ML, requiring documentation of uncertainty estimates, explainability reports, and continuous monitoring logs.Emerging Pathways to Safer MLSeveral technological approaches show promise in making automotive ML more trustworthy:Cloud-NativeMLOpsCloud platforms now allow continuous retraining and redeployment of ML models asconditions shift (e.g., new road layouts or changing weather patterns). With automated testing pipelines, everynew versioncan be checked against safety and compliance metrics before deployment.Digital Twins and Safety-Constrained Reinforcement LearningDigital replicas of cars and environments enable billions of simulated test miles without real-world risk. Reinforcement learning agents can be trained with explicit safety constraints, ensuring that unsafe behaviors are never reinforced.Self-Monitoring Agentic AIFuture systems may integrate agentic AI that audits its own behavior in real-time. Such systems could flag potential regulatory violations, halt unsafe actions, or escalate control to human operators. Thisrepresentsa step toward vehicles thatself-enforce compliancerather than relying solely on external oversight.Conclusion: Toward a Trustworthy FutureAI in automotive promises safer roads, lower maintenance costs, and smarter mobility. But none of this progress matters unless these systems areprovablysafe, transparent, and regulationready.Automakers must embed safetyobjectivesdirectly into training and evaluation. Regulators must expand standards like ISO 26262 to incorporate probabilistic models. Cloud providers and technology partners must deliver the infrastructure for continuous monitoring and compliance assurance.The next era of mobility will not be defined merely by how advanced ML models become, but by how muchtrustsociety places in them. Only when AI systems are demonstrably safe, explainable, and aligned with regulatory frameworks will we see widespread adoption of truly autonomous and intelligent vehicles.References➖ ISO 26262:2018. Road Vehicles – Functional Safety.International Organization for Standardization.➖ Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., & ManĂ©, D. (2016). Concrete Problems in AI Safety.arXivpreprint arXiv:1606.06565.➖ Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning.arXivpreprint arXiv:1702.08608.➖ Kendall, A., & Gal, Y. (2017). What uncertainties do we need in Bayesian deep learning for computer vision?Advances in Neural Information Processing Systems (NeurIPS).➖ Shapley, L. S. (1953). A value forn-person games.Contributions to the Theory of Games, 2(28), 307–317. (Basis for SHAP explainability methods).➖ National Highway Traffic Safety Administration (NHTSA). (2020). Automated Vehicles 4.0: Preparing for the Future of Transportation.U.S. Department of Transportation.Highlights in Data & AI⭕Sora 2 is here -OpenAI:OpenAI has launchedSora 2, a next-generation video and audio generation modelthat’smore physicallyaccurate, realistic, and controllable than its predecessor. It supports synchronized dialogue, sound effects, and cameo features that let users insert themselves into generated scenes. Released with a new iOS app, Sora 2 emphasizes creativity, social connection, and responsible usage.⭕Introducing Claude Sonnet 4.5 \ Anthropic:Anthropic has launched Claude Sonnet 4.5, its most powerful and aligned coding model yet. Excelling at reasoning, math, and real-world computer use, it powers complex agents with long focus spans. Alongside, Anthropic released upgrades to Claude Code, new app features, and the Claude Agent SDK. Available today via API, apps, and extensions, pricing matches Claude Sonnet 4.⭕Google AI Proposes ReasoningBank: A Strategy-Level I Agent Memory Framework that Makes LLM Agents Self-Evolve at Test Time.Google Research introducesReasoningBank, a memory framework that lets LLM agents learn from both successes and failures without retraining. By distilling interaction traces into reusable reasoning strategies, agents self-evolve across tasks. Paired with Memory-aware Test-time Scaling (MaTTS), the approach boosts effectiveness by up to 34% and reduces interaction steps by 16% on web and software-engineering benchmarks.⭕Meet oLLM: A Lightweight Python Library that brings 100K-Context LLM Inference to 8 GB Consumer GPUs via SSD Offload—No Quantization Required.oLLMis a lightweight Python library for running large-context Transformers on a single NVIDIA GPU by offloading weights and KV-cache to SSDs. Supporting models like Llama-3, GPT-OSS, and Qwen3-Next-80B, it avoids quantization, uses FP16/BF16 with FlashAttention-2, and enables up to 100K tokens on 8–10 GB VRAM. Designed for offline workloads, it trades throughput for practicality.⭕How to Build an Advanced Agentic Retrieval-Augmented Generation (RAG) System with Dynamic Strategy and Smart Retrieval?This tutorialdemonstratesan Agentic Retrieval-Augmented Generation (RAG) system that goes beyond basic document lookup. Using embeddings, FAISS, and a mock LLM, the agent decides when retrieval is needed, selects strategies (semantic, multi-query, temporal, hybrid), and synthesizes context-aware responses. The result is a more adaptive, transparent, and intelligent RAG pipeline for practical use.⭕Beyond ROC-AUC and KS: The Gini Coefficient, Explained Simply.This tutorial explains the Gini Coefficient as a classification metric alongside ROC-AUC and KS. Using the German Credit dataset, it walks through sorting predictions, plotting Lorenz curves, calculating areas, and deriving Gini. The result shows how Gini measures a model’s ability to rank positives over negatives, with higher valuesindicatingstronger separation and near-perfect classification.⭕How to Build Effective Agentic Systems with LangGraph?Amid rising agentic frameworks for powerful models (GPT-5, Gemini 2.5 Pro), this article introducesLangGraph, an agentic framework that abstracts state, tool-calling, and routing to build workflows. Using a document CRUD example, it shows graph-based intent routing (nodes, edges, state). Pros: easy setup, open-source, cleaner code. Cons: some boilerplate and framework-specific errors. Compared withLangChain/LlamaIndex/CrewAI,LangGraphbalances control and productivity.⭕Preparing Video Data for Deep Learning: Introducing Vid Prepper:This new product,vid-prepper, is an open-source Python package designed to make video preprocessing for machine learning and deep learning faster and more efficient. It provides tools for analyzing metadata, filtering out problematic files, standardizing formats/codecs/frame rates, detecting shots and objects, and converting videos into tensors. The goal is to reduce costs, avoid training bottlenecks, and simplify large-scale video data preparation.⭕Learning Triton One Kernel At a Time: VectorAddition.This tutorial introduces GPU programming with Triton by walking through vector addition as your first kernel. It explains GPU architecture basics,optimisationprinciples like reducing memory bandwidth costs and operatorfusion, andshows how Triton abstracts CUDA complexity. By writing a simple vector addition kernel, readers learn how to map work across threads, manage memory, and build efficient GPU code.⭕Building Fact-Checking Systems: Catching Repeating False Claims Before They Spread.This article explores how automated fact-checking systems can catch repeating false claims before they spread. It introduces previously fact-checked claim retrieval (PFCR), where claims are matched against existing verified ones, saving time and improving accuracy. Using retrieval–rerankerpipelines and ensemble methods that combine lexical and semantic models, the approach makes fact-checking faster, scalable, multilingual, and more reliable in today’s digital information ecosystem.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}} body { margin: 0; padding: 0; -webkit-text-size-adjust: 100% !important; -ms-text-size-adjust: 100% !important; -webkit-font-smoothing: antialiased !important; } img { border: 0 !important; outline: none !important; } p { Margin: 0px !important; Padding: 0px !important; } table { border-collapse: collapse; mso-table-lspace: 0px; mso-table-rspace: 0px; } td, a, span { border-collapse: collapse; mso-line-height-rule: exactly; } .buttontext { text-transform: inherit } .ExternalClass * { line-height: 100%; } .em_defaultlink a { color: inherit; text-decoration: none; } .em_footer a { color: #979797; text-decoration: underline; } .em_purple a { color: #8a2ac2 !important; text-decoration: underline !important; } .em_g_img+div { display: none; } a[x-apple-data-detectors], u+.em_body a, #MessageViewBody a { color: inherit; text-decoration: none; font-size: inherit !important; font-family: inherit !important; font-weight: inherit !important; line-height: inherit; } @media only screen and (max-width: 100%; } .em_wrapper { width: 100%; } .em_hide { display: none !important; } .em_full_img img { width: 100%; height: auto !important; max-width: 100%; } .em_center { text-align: center !important; } .em_side15 { width: 100%; } .em_ptop { padding-top: 20px !important; } .em_pbottom { padding-bottom: 20px !important; } .em_h20 { height: 20px !important; font-size: 1px !important; line-height: 1px !important; } .em_hauto { height: auto !important; } u+.em_body .em_full_wrap { width: 100%; width: 100%; } .em_pad { padding: 20px 15px !important; } .em_ptb { padding: 20px 0px 20px !important; } .em_pad1 { padding: 20px 15px 10px !important; } .em_pad2 { padding: 10px 15px 20px !important; } .em_ptb1 { padding: 30px 0px 20px !important; } .em_plrb { padding: 0px 15px 20px !important; } .em_h10 { height: 10px !important; line-height: 0px !important; font-size: 0px !important; } .em_wrap_50 { width: 100%; } } @media screen and (max-width: 100%; height: auto !important; } .em_img_1 img { width: 100%; height: auto !important; } .em_img_2 { width: 100%; height: auto !important; } .em_img_2 img { width: 100%; height: auto !important; } .em_img_3 { width: 100%; height: auto !important; } .em_img_3 img { width: 100%; height: auto !important; } .em_img_4 { width: 100%; height: auto !important; } .em_img_4 img { width: 100%; height: auto !important; } }
Read more
  • 0
  • 0
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
Merlyn from Packt
08 Sep 2025
9 min read
Save for later

Real-World Lessons From 50+ Agentic Orchestration Projects, Gemini Cloud Assist for Spark, NetoAI’s TSLAM: First Open-Source Telecom LLM, ARGUS Recommender

Merlyn from Packt
08 Sep 2025
9 min read
Google’s Personal Health Agent, Bioinformatics AI Agent in Colab using Biopython [đŸ“œïž Webinar] Still guessing where to start with AI? We’ll show you. @media only screen and (max-width: 100%; width: 100%; padding-right: 20px !important } .hs-hm, table.hs-hm { display: none } .hs-hd { display: block !important } table.hs-hd { display: table !important } } @media only screen and (max-width: 100%; border-right: 1px solid #ccc !important; box-sizing: border-box } .hse-border-bottom-m { border-bottom: 1px solid #ccc !important } .hse-border-top-m { border-top: 1px solid #ccc !important } .hse-border-top-hm { border-top: none !important } .hse-border-bottom-hm { border-bottom: none !important } } .moz-text-html .hse-column-container { max-width: 100%; width: 100%; vertical-align: top } .moz-text-html .hse-section .hse-size-12 { max-width: 100%; width: 100%; width: 100%; vertical-align: top } .hse-section .hse-size-12 { max-width: 100%; width: 100%; padding-bottom: 0px !important } #section-0 .hse-column-container { background-color: #fff !important } } @media only screen and (max-width: 100%; padding-bottom: 0px !important } #section-2 .hse-column-container { background-color: #fff !important } } @media only screen and (max-width: 100%; padding-bottom: 0px !important } #section-3 .hse-column-container { background-color: #fff !important } } #hs_body #hs_cos_wrapper_main a[x-apple-data-detectors] { color: inherit !important; text-decoration: none !important; font-size: inherit !important; font-family: inherit !important; font-weight: inherit !important; line-height: inherit !important } a { text-decoration: underline } p { margin: 0 } body { -ms-text-size-adjust: 100%; -webkit-text-size-adjust: 100%; -webkit-font-smoothing: antialiased; moz-osx-font-smoothing: grayscale } table { border-spacing: 0; mso-table-lspace: 0; mso-table-rspace: 0 } table, td { border-collapse: collapse } img { -ms-interpolation-mode: bicubic } p, a, li, td, blockquote { mso-line-height-rule: exactly } Don’t miss this exclusive presentation on September 30. View in browser Tuesday, September 30 | 11:00 AM ET / 8:00 AM PT Over the past several months, Camunda has worked with more than 50 customers to design and implement agentic orchestration solutions. This gave usa front-row view into how organizations are using AI agents to reshape operations: what works, what doesn’t, and what to do next. In this session, our team will share key takeaways from deployments across banking, insurance, healthcare, telecom, and other industries. We'll cover: Emerging patterns and proven best practices Common pitfalls to watch out for How AI agents integrate with human decision-making Measurable outcomes in speed, accuracy, and customer experience Whether you’re just starting your AI automation journey or scaling enterprise-wide, you’ll leave with practical guidance to make agentic orchestration work in your organization. Save Your Seat SponsoredSubscribe|Submit a tip|Advertise with UsYour Weekly Dose of Data & ML -Connecting Challenges to BreakthroughsWelcome toDataPro #148, your trusted guide through the fast-moving world of data science, machine learning, and AI infrastructure. Every week, we connect the toughest problems researchers and engineers face with the solutions shaping the next wave of innovation.This edition covers breakthroughs where AI directly tackles long-standing pain points:Faster Spark troubleshooting:Google’sGemini Cloud Assistpinpoints failures and bottlenecks in minutes, replacing hours of log-diving.Next-gen recommender systems:Yandex’sARGUSscales to a billion parameters, capturing long user histories and driving record engagement.Personalized health AI:Google’sPersonal Health Agentorchestrates multiple agents to deliveraccurate, trusted health guidance.Domain-specific LLMs:NetoAI’sTSLAM, trained on AWSTrainium, becomes the first open-source telecom LLM, cutting costs and boosting accuracy by 37%.Also inside: aColab-readyBioinformatics AI AgentwithBiopython,Baseten’s225% inference efficiency gains,FineVision’s24M multimodal dataset, andnew methods inDeepSpeed,LangExtract, Random Forest tuning, and Flink CMK encryption.AtDataPro, we believe keeping up with data and AIisn’tabout chasing hype,it’sabout understanding how problems get solved, and how those solutions expandwhat’spossible.Cheers,Merlyn ShelleyGrowth Lead, PacktTop Tools Driving New Research 🔧📊🔾Meet ARGUS: A Scalable AI Framework for Training Large Recommender Transformers to One Billion Parameters.Yandex introducedARGUS, a transformer-based recommender framework scaling to one billion parameters. It tackles long-standing issues of short memory, scalability, and adaptability by modeling extended user histories up to 8,192 interactions. Innovations include dual-objective pre-training, scalable encoders, and efficient fine-tuning. Deployed on Yandex Music, ARGUS achieved record gains: +2.26% listening time and +6.37% likes. This positions Yandex alongside Google, Netflix, and Meta as leaders in large-scale recommender systems.🔾Google AI Introduces Personal Health Agent (PHA): A Multi-Agent Framework that Enables Personalized Interactions to Address Individual Health Needs.Google introduced thePersonal Health Agent (PHA), a multi-agent framework built on Gemini 2.0 that integrates data science, domainexpertise, and health coaching via an orchestrator. Evaluated on 10 benchmarks with 7,000+ annotations and 1,100 expert hours, PHA outperformed baseline models in accuracy, personalization, and trust. Though still research, it sets a blueprint for modular, agentic health AI capable of reasoning across multimodal data.🔾How Baseten achieves 225% better cost-performance for AI inference:Baseten, in partnership with Google Cloud and NVIDIA, achieved225% better cost-performance for high-throughput AI inferenceand25% for latency-sensitive workloadsusing A4 VMs (NVIDIA Blackwell) and Google Cloud’s Dynamic Workload Scheduler. By combiningcutting-edgeGPUs,TensorRT-LLM, Dynamo, and multi-cloud redundancy,Basetendelivers scalable, resilient inference. This breakthrough lowers costs and unlocks real-time, production-ready AI applications across industries, from agentic workflows to media and healthcare.Topics Catching Fire in Data Circles đŸ”„đŸ’ŹđŸ”žImplementing DeepSpeed for Scalable Transformers: Advanced Training with Gradient Checkpointing and Parallelism.This advancedDeepSpeedtutorialdemonstrateshow to efficiently train large transformers usingZeROoptimization, FP16 mixed precision, gradient accumulation, and advanced parallelism. It covers full workflows: model setup, dataset creation, GPU memory monitoring, checkpointing, inference, and benchmarkingZeROstages. Learners gain hands-on practice with gradient checkpointing, CPU offloading, and advanced features like pipeline andMoEparallelism, making large-scale LLM training accessible evenonresource-limited environments likeColab.🔾Troubleshoot Apache Spark on Dataproc with Gemini Cloud Assist AI:Google Cloud introducedGemini Cloud Assist InvestigationsforDataprocand Serverless for Apache Spark, an AI-powered tool that diagnoses job failures and performance bottlenecks. It analyzes logs, metrics, and configs across services to pinpoint root causes, whether infrastructure, configuration, application, or data issues, and provides actionable fixes. Accessible via console or API, it accelerates troubleshooting, boosts team efficiency, and empowers engineers without deep Sparkexpertiseto resolve issues quickly.🔾Extracting Structured Data with LangExtract: A Deep Dive into LLM-Orchestrated Workflows:LangExtractis a workflow library forLLM-based structured extractionthat fixes schema drift and missing facts via prompt orchestration, chunking, and optional parallel or multi-pass extraction. It fine-tunes prompts per model, manages token limits, and streams results as generator outputs. A hands-on demo ingestsTechXploreRSS, filters articles, runs few-shot extractions (e.g., sectors, metrics, values, regions), and aggregates results intodataframes. Best practices: rich examples, 2+ extraction passes, and tunedmax_workers.🔾The Beauty of Space-Filling Curves: Understanding the Hilbert Curve.Hilbert curve, a classic space-filling curve, links 1D order to n-D coordinates while preserving locality, vital for big-data systems (e.g., Databricks liquid clustering) and ML on spatial data. The article surveys SFC history(Peano→Hilbert), properties (continuous, surjective,Hausdorffdim 2), and a practical implementation usingSkilling’s algorithm(binary→Graycode, bit disentanglement, XOR rotations) for fastindex↔coordinatemapping. Applications include partitioning, clustering, indexing, compression, and efficient range queries with fewer fragmented clusters.New Case Studies from the Tech Titans 🚀💡🔾How to Create a Bioinformatics AI Agent Using Biopython for DNA and Protein Analysis.Build aBioinformatics AI AgentinColabusingBiopythonto streamline DNA/protein analysis. The tutorial wraps sequence fetching (NCBI), composition/GC%/MW,translationand protein stats,MSA,phylogenetic trees,motif search,codon usage, andGC sliding windowsinto one class withPlotly/Matplotlibvisuals. Start with sample sequences (SARS-CoV-2 Spike, Human Insulin, E. coli 16S) or custom accessions.It’sa hands-on, end-to-end pipeline for education, research, and rapid prototyping.🔾How NetoAI trained a Telecom-specific large language model using Amazon SageMaker and AWS Trainium.NetoAIbuiltTSLAM, the first open-sourcetelecom-specific LLM, by fine-tuningLlama-3.1-8BwithLoRAonAWSTrainium(Trn1)viaAmazon SageMaker.Trainiumcut training time to <3 days and lowered costs, while SageMaker ensured scalability and compliance. Deployed onAWS Inferentia2, TSLAM delivers low-latency inference for real-world telco agents (fault diagnosis, customer service, planning, config management). Results:86.2% accuracy vs. 63.1% base, ~37% performance gain, with plans to scale further onTrn2.🔾Zero-Inflated Data: A Comparison of Regression Models:Zero-inflated data occurs when a dataset has far more zeros than expected, such as bike usage where most people report zero days. Standard Poisson regression struggles with this, so specialized models work better. TheZero-Inflated Poisson (ZIP)model handles excess zeros by combining a Bernoulli zero model with a Poisson count model, whilehurdle modelsfirst predict zero vs. non-zero and then model only the positives. In practice, both outperform Poisson or linear regression, with hurdle models offering a faster, solid fit and ZIP excelling when the data truly follows a zero-inflated pattern.Blog Pulse: What’s Moving Minds 🧠✹🔾Hugging Face Open-Sourced FineVision: A New Multimodal Dataset with24 Million Samples for Training Vision-Language Models (VLMs).Hugging Face releasedFineVision, a massive open multimodal dataset with17.3M images, 24.3M samples, and 10B tokens, built from 200+ sources and carefully cleaned, rated, and deduplicated. Covering domains from VQA and OCR to charts, science, and GUI navigation, it delivers up to46% performance gainsover prior datasets, with only1% benchmark leakage. Fully open-sourced,FineVisionsets a new standard for training robust, diverse, and reproducible vision-language models.🔾Achieve full control over your data encryption using customer managed keys in Amazon Managed Service for Apache Flink.Amazon Managed Service for Apache Flink now supportscustomer managed keys (CMKs)in AWS KMS, giving organizations full control over data encryption for checkpoints, snapshots, and running state. While the service already encrypts data by default with AWS-owned keys, CMKs let you manage lifecycle policies, enforce least-privilege access, and meet strict compliance requirements. Enabling CMKs involves defining IAM/operator policies, updating the application with the CMK, and restarting for changes to take effect. Supported fromFlink runtime 1.20, this feature balances strong security with operational flexibility.🔾A Visual Guide to Tuning Random Forest Hyperparameters:This post explores howhyperparameter tuning affects Random Forests, using the California housing dataset. A default forest (100 trees, unlimited depth) already outperforms tuned decision trees, highlighting the strength of ensembles. Visualizations of trees, predictions, errors, and feature importances show how forests reduce variance. Experiments with depth limits,n_estimators,n_jobs, and Bayes search reveal trade-offs: more trees or tuning slightly improve metrics (MAE ~0.31, RÂČ ~0.83) butgreatly increasetraining time.Takeaway:Random forests offerstrong performanceout-of-the-box, but tuning brings marginal gains at significant computational cost.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}} @media only screen and (max-width: 100%; width: 100%; padding-right: 20px !important } .hs-hm, table.hs-hm { display: none } .hs-hd { display: block !important } table.hs-hd { display: table !important } } @media only screen and (max-width: 100%; border-right: 1px solid #ccc !important; box-sizing: border-box } .hse-border-bottom-m { border-bottom: 1px solid #ccc !important } .hse-border-top-m { border-top: 1px solid #ccc !important } .hse-border-top-hm { border-top: none !important } .hse-border-bottom-hm { border-bottom: none !important } } .moz-text-html .hse-column-container { max-width: 100%; width: 100%; vertical-align: top } .moz-text-html .hse-section .hse-size-12 { max-width: 100%; width: 100%; width: 100%; vertical-align: top } .hse-section .hse-size-12 { max-width: 100%; width: 100%; padding-bottom: 0px !important } #section-0 .hse-column-container { background-color: #fff !important } } @media only screen and (max-width: 100%; padding-bottom: 0px !important } #section-2 .hse-column-container { background-color: #fff !important } } @media only screen and (max-width: 100%; padding-bottom: 0px !important } #section-3 .hse-column-container { background-color: #fff !important } } #hs_body #hs_cos_wrapper_main a[x-apple-data-detectors] { color: inherit !important; text-decoration: none !important; font-size: inherit !important; font-family: inherit !important; font-weight: inherit !important; line-height: inherit !important } a { text-decoration: underline } p { margin: 0 } body { -ms-text-size-adjust: 100%; -webkit-text-size-adjust: 100%; -webkit-font-smoothing: antialiased; moz-osx-font-smoothing: grayscale } table { border-spacing: 0; mso-table-lspace: 0; mso-table-rspace: 0 } table, td { border-collapse: collapse } img { -ms-interpolation-mode: bicubic } p, a, li, td, blockquote { mso-line-height-rule: exactly }
Read more
  • 0
  • 0

Merlyn from Packt
04 Sep 2025
8 min read
Save for later

DataPro Expert Insight: Agentic AI: The Next Leap in Intelligent Systems

Merlyn from Packt
04 Sep 2025
8 min read
From Prompt to Purpose: Agentic AI and the Rise of Autonomous IntelligenceBecome the AI Generalist that makes big $ Using AIDid you know that, Sam Altman has predicted that by 2025, AI will impact over 50% of knowledge-based jobs, data analysis, financial planning, strategic decisions, auditing, and creative work that once required specialists.While others worry about being replaced, you can profit from this transformation. The future belongs to AI-powered generalists who can leverage AI to deliver specialist-level results.And you could be the next one to do it!So..Join Outskill's 2 day AI- Mastermind this weekend (usually for $895) and become an AI expert.Register now for freeWhen: Saturday and Sunday, 10 AM - 7 PM.In just 16 hours & 5 sessions, you will:✅ Build AI Agents and custom bots that handle your repetitive work and free up 20+ hours weekly✅ Learn how AI really works by learning 10+ AI tools, LLM models and their practical use cases.✅ Learn to build websites and ship products faster, in days instead of months✅ Create professional images and videos for your business, social media, and marketing campaigns.✅ Turn these AI skills into10$k income by consulting or starting your own AI services business.Learn million $ insights used by biggest giants like google, amazon, microsoft from their practitioners đŸš€đŸ”„Unlock bonuses worth $5100 in 2 days!🔒day 1:3000+ Prompt Bible🔒day 2: Roadmap to make $10K/month with AI🎁Additional bonus: Your Personal AI Toolkit BuilderJoin now for $0SponsoredSubscribe|Submit a tip|Advertise with UsWelcome to DataPro 147 – Expert-Led EditionYour Weekly Brief on What’s Next in AI, ML, and Data EngineeringThis week, we’re featuring an expert insight from Sagar Lad, Data & AI Solution Architect, who unpacks a pivotal evolution in artificial intelligence: the emergence of Agentic AI,intelligent systems that don’t just respond, but pursue goals, adapt in real time, and collaborate with other agents to get things done.For data scientists, ML engineers, and AI practitioners, Agentic AI marks a fundamental shift. Most of today’s AI systems are reactive, they answer prompts, complete predefined tasks, or generate outputs within limited contexts. Agentic systems are different. They perceive, reason, act, and learn, enabling multi-step autonomy in enterprise and real-world environments.In this technical deep dive, Sagar explores:đŸ”čWhat Agentic AI is and why it matters for the next wave of AI systemsđŸ”čHow modern architectures blend LLMs, memory, tool use, and orchestrationđŸ”čThe enabling technologies: LangChain, Semantic Kernel, vector databases, cloud-native platforms, and moređŸ”čChallenges like LLM brittleness, multi-agent coordination, and security risksđŸ”čAnd how Agentic AI is already finding footholds in data engineering workflows, MLOps, and autonomous decision systemsIf you’re working at the edge of data and intelligence, this is the edition to bookmark.Let’s dive in 👇 For tech leaders shaping AI strategy in the enterpriseAI adoption brings real pressures:Prove ROI on LLM initiatives.Protect data privacy & compliance when using open-source models.Scale responsibly without being derailed by hallucinations, talent gaps, or security risks.That’s why we built TechLeader Voices by Packt — a newsletter that delivers real-world playbooks, frameworks, and lessons from frontline AI leaders.Subscribe and unlock the Executive Insights Pack — including 1 report, 1 case study, and 5 power talks — valid for the next 48 hours only.Join TechLeader Voices to Access the PackCheers,Merlyn ShelleyGrowth Lead, PacktAgentic AI: The Next Leap in Intelligent Systems | by Sagar LadArtificial Intelligence has already transformed industries with predictive analytics, natural language understanding, and generative capabilities. But most AI systems today are reactive — they respond to prompts, execute predefined tasks, or generate outputs within bounded contexts. The next evolution is Agentic AI: systems that can act autonomously, pursue goals, adapt to environments, and coordinate with other agents to achieve outcomes with minimal human intervention.This article explores what Agentic AI is, why it matters, its architectural principles, key enablers, technical challenges, and enterprise applications.What is Agentic AI?At its core, Agentic AIrepresentsa shift from stateless, prompt-driven systems (e.g., today’s chatbots and LLMs) to autonomous, goal-oriented agents. An agentic AI system can:Perceive— Gather information from structured and unstructured sources (APIs, sensors, documents).Reason— Apply contextual knowledge, logic, and planning todeterminethe best course of action.Act— Execute tasks, trigger workflows, or interact with digital/physical systems.Adapt— Learn from feedback, outcomes, and environment changes to improve future performance.Agentic AI at its CoreUnlike traditional automation or AI models that need constant supervision, agentic systems can plan, prioritize, and execute multi-step tasks independently.The convergence of several technological trends is accelerating the rise of Agentic AI:Large Language Models (LLMs) as Reasoning Engines: Modern LLMs can interpret vague instructions, break them into sub-tasks, and suggest solutions.Tool Augmentation: APIs and plugins extend AI capabilities beyond text generation into search, data retrieval, code execution, and robotic control.Memory Architectures: Vector databases and knowledge graphs allow agents to store, recall, and refine knowledge over time.Orchestration Frameworks: Platforms like LangChain, Semantic Kernel, and Microsoft Prompt Flow enable chaining of multiple reasoning steps and tool calls.Cloud-Native AI Platforms: Services like Azure AI Foundry and AWS Bedrock are simplifying deployment and scaling of multi-agent systems.This technological maturity makes it possible to design agents that can operate with goal-directed autonomy while still adhering to enterprise safety, governance, and compliance standards.Architectural Principles of Agentic AIAgentic AI solutions typically follow a layered architecture:Perception Layer: Responsible for gathering and interpreting data from the environment. Technologies include sensors, Natural Language Processing (NLP), and Computer Vision to perceive text, images, and speech.Cognitive Layer: The brain of the system, encompassing reasoning and decision-making. Employs machine learning models, including reinforcement learning, to analyze inputs and predict outcomes.Action Layer: Executes decisions through physical or digital means. Incorporates feedback loops for self-correction and continuous improvement.Communication Layer: Enables interaction with users and other systems. Supports multimodal communication (e.g., text, voice, visual) for seamless integration.This modular design ensures that agents are not “black boxes” but traceable, governed systems that can fit into enterprise architecture.Key Enablers1. Autonomous PlanningAgents can break down goals into sub-goals and dynamically re-plan when obstacles occur. For example, an AI project manager could reassign tasks if a resource becomes unavailable.2. Tool Use and API IntegrationBy connecting to enterprise systems (like SAP, Salesforce, or Azure DevOps), agents move fromknowledge workerstoexecution workers.3. Multi-Agent CollaborationInstead of a single agent, ecosystems of specialized agents can cooperate. Example: one agent handles data retrieval, another validates compliance, while a third presents the final report.4. Persistent MemoryUnlike stateless chatbots, agentic systems remember previous interactions, allowing continuity in long-term projects or customer engagements.5. Responsible AI ControlsAgentic AI cannot succeed withoutrobust guardrails: bias detection, safety filters, role-based access, and explainability features.Challenges in Building Agentic AIDespite the potential, several technical and organizational challenges must be addressed:Reliability of LLM Reasoning— Current models may hallucinate or produce brittle plans. Agents must include validation and error recovery.Scalability of Multi-Agent Systems— Coordinating multiple agents without excessive overhead is non-trivial.Integration Complexity— Enterprises run heterogeneous systems; seamless API orchestration is essential.Security Risks— Autonomous agents with execution powers increase risks of unauthorized actions, data leakage, or adversarial prompts.Ethical and Compliance Concerns— Decisions must align with legal and regulatory requirements, particularly in sensitive domains like healthcare and finance.Enterprise ApplicationsSoftware EngineeringAgents that debug code, run unit tests, and deploy fixes.Autonomous backlog grooming and sprint planning.Data & AnalyticsAutomated data quality checks, lineage tracing, and governance enforcement.Agents that query data warehouses, generate insights, and prepare visualizations.Customer ExperienceProactive agents that resolve issues without waiting for customer complaints.Multi-modal support agents integrating voice, chat, and visual instructions.Business OperationsIntelligent RPA 2.0: replacing static workflows with adaptive agents.Supply chain optimization: monitoring inventory, predicting delays, re-routing shipments.Knowledge ManagementContinuous synthesis of insights from documents, emails, and reports.Agents that maintain living enterprise knowledge bases.The Road AheadAgentic AI represents a paradigm shift: from “AI as a tool” to “AI as a collaborator.” The near future will likely see:Standardization of Agent Frameworks— Interoperability between different orchestration tools and vendors.Enterprise AI Operating Systems— Platforms that manage agent lifecycles, policies, and performance.Specialized Industry Agents— Domain-specific agents trained on healthcare protocols, financial compliance, or manufacturing processes.Human-Agent Collaboration Models— Workflows where humans define intent and agents execute while keeping humans in control of critical decisions.ConclusionAgentic AI has the potential to transform enterprises fromdata-driventogoal-drivenorganizations. By combining reasoning, memory, and autonomous action, agents can handle complex workflows that once required human supervision. Yet, this power must be matched with strong governance, safety, and ethical oversight.For technical leaders, the challenge is not justbuilding powerful agents, butbuilding trustworthy ones. The organizations that succeed will be those that strike the right balance between autonomy and accountability, unlocking productivity gains while maintaining control.The age of Agentic AI has begun — not as a replacement for human intelligence, but as a force multiplier that augments human capabilities and accelerates digital transformation.Dive deeper and read the full piece on PacktHub Medium.We’ll be back with more soon!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Merlyn from Packt
21 Aug 2025
7 min read
Save for later

DataPro Expert Insight: Data Products – Turning Data into Tangible Value

Merlyn from Packt
21 Aug 2025
7 min read
FREE GUIDE: Airflow 3 Tips & Code SnippetsFREE GUIDE: Airflow 3 Tips & Code SnippetsThinking about upgrading to Apache Airflow¼ 3? You’ll get powerful new features like a modernized UI, event-based scheduling, and streamlined backfills. Quick Notes: Airflow 3 Tips & Code Snippets is a concise, code-filled guide to help you start developing DAGs in Airflow 3 today.You’ll learn:How to run Airflow 3 locally (with dark mode) and navigate the new UIHow to manage DAG versioning and write DAGs with the new @asset-oriented approachThe key architectural changes from Airflow 2 to 3GET YOUR FREE GUIDESponsoredSubscribe|Submit a tip|Advertise with UsWelcome to DataPro #146: Expert Insight Edition.We’re excited to bring on board Sagar Lad, Lead Data Solution Architect at a leading Dutch bank, to the Expert Insight edition of the DataPro newsletter. Sagar will be sharing his hard-won lessons, practical tips, and implementation strategies for navigating the challenges of data in the Gen AI and Agentic AI era.Each week, Sagar will guide you through his in-depth analysis and research, showing what really works in complex production environments. His goal is simple: help you turn concepts into practice and ideas into impact.This week, he kicks things off with a deep dive into Data Products: Turning Data into Tangible Value. As always, our mission at DataPro is to bring you first-hand, practical insights from industry experts. We believe Sagar’s expertise will provide valuable guidance you can apply directly to your daily data practice.So, without further ado, let’s jump in.Cheers,Merlyn ShelleyGrowth Lead, PacktData Products: Turning Data into Tangible Value - By Sagar LadIn today’s digital economy, data has become one of the most valuable assets for organizations. Every transaction, interaction, and process generates data that — when properly harnessed — can unlock powerful insights, drive innovation, and create competitive advantages. However, simply collecting and storing vast amounts of data is not enough. To truly realize its value, organizations must transform data into usable, scalable, and outcome-driven solutions. This is where the concept of adata productcomes into play.A data product is not just raw data, but rather a packaged, consumable, and value-generating asset built on top of data. Just as traditional products solve customer needs, data products solve business challenges by delivering insights, predictions, or automated decisions in a way that is accessible and reliable for end users.What is a Data Product?At its core, adata productis a solution designed around data to serve a specific purpose or generate business value. It could take many forms — such as a dashboard, an API serving machine learning predictions, a recommendation engine, or even a dataset curated for a particular domain.For example:→ Netflix’s recommendation systemis a data product built to enhance user engagement.Characteristics of a data product include:1. Purpose-driven— It is built to achieve a clear outcome (e.g., increase sales, reduce costs, improve customer satisfaction).2. Reusable— A well-designed data product can serve multiple teams or applications.3. Consumable— It is packaged in a way that non-technical users or systems can leverage it seamlessly.4. Scalable— It is designed to evolve with changing business needs and data volumes.Data Product: Bridge between Producer & ConsumerData Products vs. Data AssetsIt is important to differentiate betweendata assetsanddata products.Adata assetcould be a data lake, warehouse, or dataset that stores raw or processed data. While valuable, assets by themselves may not generate outcomes unless someone analyzes them.Adata product, on the other hand, transforms these assets into actionable, consumable outputs that stakeholders can directly use to make decisions or power business processes.In other words, data assets are ingredients, while data products are the finished dishes that customers can consume.Why Do Organizations Need Data Products?Organizations often struggle with extracting value from their data investments. Billions of dollars are spent globally on data platforms, yet many businesses face the“last mile problem”— where insights fail to reach decision-makers in a meaningful way. Data products help bridge this gap by operationalizing data and embedding it into workflows.Key benefits of data products include:1. Faster Decision-MakingWith well-packaged insights, business users don’t need to spend hours querying databases or waiting for reports. A data product like a sales forecasting model can instantly provide actionable intelligence.2. Democratization of DataData products abstract technical complexity, enabling business users, analysts, and applications to easily consume data-driven insights.3. Standardization and ReusabilityInstead of rebuilding analytics pipelines repeatedly, a single data product can serve multiple business units. For example, a customer segmentation data product could be reused by marketing, sales, and product teams.4. Scalability and AutomationData products, once designed, can be scaled to handle growing data volumes and embedded into automated workflows.5. Value RealizationUltimately, data products help organizations move beyond storing data tomonetizing and operationalizing it— whether through cost savings, revenue generation, or improved customer experiences.Key Principles for Designing Data ProductsDesigning a successful data product requires more than technical skills — it requires product thinking. Some guiding principles include:1.Start with Business ValueA data product must solve a real business problem. Before building, clearly define the outcome it should drive.2. User-Centric DesignThe product should be intuitive for its target users, whether that’s executives, developers, or customers.3. Trust & TransparencyUsers must trust the data product. This requires data quality checks, explainability in AI models, and governance measures.4. Scalability & ReusabilityBuild products that can adapt to future needs, serve multiple stakeholders, and scale across datasets and domains.5. OperationalizationA data product should integrate seamlessly into business workflows and systems, rather than existing as a standalone artifact.6. Monitoring & ImprovementData products must be continuously monitored for performance, accuracy, and relevance, with feedback loops for improvements.Challenges in Building Data ProductsWhile data products are powerful, organizations face challenges in creating and scaling them:1. Data Quality Issues: Poor data leads to unreliable products.2. Cultural Resistance: Teams may hesitate to trust automated insights.3. Lack of Product Mindset: Many companies treat data as IT projects, not products.4. Scalability Hurdles: A data product may work for a pilot but struggle in enterprise-wide deployments.5. Governance & Compliance: Ensuring data products adhere to regulatory and ethical standards is critical.Overcoming these requires strongdata governance, clear ownership, cross-functional collaboration, and a product-centric approach.The Role of Data Mesh and Data ProductsThe concept ofdata productsis also central toData Mesharchitecture. In Data Mesh, each domain team is responsible for building and managing its own data products, treating them as first-class citizens. This shifts ownership from centralized IT teams to domain experts, making data products more relevant, accurate, and consumable.By combining Data Mesh principles with robust product management practices, organizations can scale their data strategy while ensuring alignment with business outcomes.Future of Data ProductsThe future of data products looks promising as technology evolves:1. AI-driven Data Products: With advancements in generative AI, data products will become more conversational, adaptive, and personalized.2. Marketplace of Data Products: Organizations may buy and sell data products just like SaaS solutions, creating new revenue streams.3. Self-Service Ecosystems: Business users will increasingly be able to design their own data products using no-code/low-code platforms.4. Embedded Trust & Ethics: As AI governance matures, responsible AI principles will be embedded directly into data products.ConclusionData products represent a fundamental shift in how organizations leverage data. They move beyond static reports or siloed datasets to create reusable, scalable, and outcome-driven solutions. By applying product thinking to data initiatives, companies can ensure that data investments directly translate into measurable business value.In a world where data is the new currency,data products are the vehicles that convert raw information into tangible value. The organizations that master this art will be the ones that thrive in the data-driven future.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Merlyn from Packt
18 Aug 2025
13 min read
Save for later

Hugging Face’s AI Sheets, Salesforce AI’s Moirai 2.0, Amazon’s DeepFleet Open-Source Vision-Language Model - Dots.OCR (1.7B Parameters)

Merlyn from Packt
18 Aug 2025
13 min read
Gemma 3 270M, Model Predictive Control (MPC) Using Python and CasADiThe 16 hour AI challenge: Learn AI, build Products & Earn $10K/MonthChatGPT 5 just dropped and guess what? 300 million jobs became obsolete overnight.While companies are panic-firing entire departments, a small group of AI-skilled professionals are charging $10K/month as consultants to automate those same jobs.The difference? They know the frameworks, workflows, and monetization strategies that 99% of people don't.Join Outskill's 16-Hour AI Sprint this weekend (usually for $895) and become the AI expert companies are desperately hiring – not firing. Register now for freeDate: Saturday and Sunday, 10 AM - 7 PM. Rated 9.8/10 by trustpilot– an opportunity that makes you an AI Generalist that can build, solve & work on anything with AI.In just 16 hours & 5 sessions, you will:✅ Build AI Agents and custom bots that handle your repetitive work and free up 20+ hours weekly✅ Learn how AI really works by learning 10+ AI tools, LLM models and their practical use cases.✅ Learn to build websites and ship products faster, in days instead of months✅ Create professional images and videos for your business, social media, and marketing campaigns.✅ Turn these AI skills into10$k income by consulting or starting your own AI services business.Learn million $ insights used by biggest giants like google, amazon, microsoft from their practitioners đŸš€đŸ”„Unlock bonuses worth $5100 in 2 days!🔒day 1:3000+ Prompt Bible🔒day 2: Roadmap to make $10K/month with AI🎁Additional bonus: Your Personal AI Toolkit BuilderJoin now for $0SponsoredSubscribe|Submit a tip|Advertise with UsWelcome to DataPro 145 ~your go-to guide for all things data and AI.You might be wondering why DataPro landed in your inbox on a Monday. We’re experimenting, testing which send days work best for readers like you, while also exploring new topic areas to ensure this newsletter continues to meet your needs. In a world where new models, frameworks, and breakthroughs arrive almost daily, staying up to date isn’t just nice to have, it’s essential. Whether you’re building, researching, or scaling AI systems, the difference between leading and lagging often comes down to who’s plugged into the right knowledge at the right time. That’s why we bring you DataPro: your weekly pulse on the launches, research, and tutorials shaping the field, curated with clarity, context, and links you can act on.This week’s lineup brings major releases and practical guides shaping AI and data:🔗 Hugging Face introduces AI Sheets - a no-code, local-first spreadsheet tool that makes dataset creation and enrichment as simple as typing a prompt.🔗 Salesforce AI releases Moirai 2.0 - a decoder-only transformer setting new benchmarks in time-series forecasting with smaller, faster, and more accurate models.🔗 Amazon unveils DeepFleet - a foundation model suite trained on billions of robot-hours to predict and optimize fleet traffic patterns in warehouses.🔗 Meet dots.ocr - a 1.7B parameter open-source vision-language model achieving state-of-the-art multilingual OCR and document parsing across 100+ languages.We also dive into tutorials that caught fire this week: Google’s Gemma 3 270M, built for hyper-efficient fine-tuning, and a hands-on guide to Model Predictive Control (MPC) using Python and CasADi.👉 A full-stack edition for builders, researchers, and thinkers who thrive on fresh ideas in data and AI,let’s unpack it.Cheers,Merlyn ShelleyGrowth Lead, PacktTop Tools Driving New Research đŸ”§đŸ“ŠđŸ”” Q2 2025 AI Hypercomputer updates: Google Cloud’s AI Hypercomputer is redefining scale: powering Gemini, Veo 3, and serving 980T+ tokens monthly. Highlights this quarter include Dynamic Workload Scheduler, Cluster Director upgrades, llm-d v0.2, and MaxText/MaxDiffusion improvements. Explore open frameworks, TPU/GPU scaling, and claim $300 free credit to simplify AI deployment and boost performance.đŸ”” How to Test an OpenAI Model Against Single-Turn Adversarial Attacks Using deepteam? Learn how to red team OpenAI models with deepteam, an open-source toolkit offering 10+ single-turn adversarial attacks including prompt injection, jailbreaking, leetspeak, Base64, and more. This hands-on guide shows how to install dependencies, set up your API key, define vulnerabilities, and test GPT-4o-mini against real-world adversarial prompts.đŸ”” Salesforce AI Releases Moirai 2.0: Salesforce’s Latest Time Series Foundation Model Built on a Decoder‑only Transformer Architecture. Salesforce AI Research introduces Moirai 2.0, a decoder-only transformer that tops GIFT-Eval benchmarks for time series forecasting. It’s 44% faster, 96% smaller, yet more accurate than Moirai_large. With multi-token prediction, advanced filtering, and diverse training data, it enables scalable forecasting across IT ops, sales, demand, and supply chain planning.đŸ”” Transform your data to Amazon S3 Tables with Amazon Athena: Amazon Athena now supports CTAS with S3 Tables, enabling serverless SQL-based data transformation with built-in Iceberg optimization, ACID transactions, and automatic maintenance. Easily migrate datasets (CSV, Parquet, JSON, etc.) into analytics-ready tables. The tutorial demonstrates transforming customer review data into S3 Tables, unlocking faster queries, simplified ETL, and robust enterprise-scale analytics.đŸ”” Estimating from No Data: Deriving a Continuous Score from Categories. This blog is about how to derive a continuous, fine-grained score from categorical outcomes when only labeled categories are available for training. It explains why standard classifiers fail to produce meaningful scores, and demonstrates how low-capacity networks with a linear bottleneck and category approximator head can generate interpretable, ordered risk scores.Topics Catching Fire in Data Circles đŸ”„đŸ’ŹđŸ”” Meet DeepFleet: Amazon’s New AI Models Suite that can Predict Future Traffic Patterns for Fleets of Mobile Robots. Amazon unveils DeepFleet, a suite of foundation models trained on billions of robot-hours to optimize warehouse fleets. Already enhancing operations across 300+ facilities, DeepFleet improves robot coordination, cuts congestion, and boosts efficiency by up to 10%. With RC, RF, IF, and GF architectures, it marks a leap in multi-robot forecasting.đŸ”” From Deployment to Scale: 11 Foundational Enterprise AI Concepts for Modern Businesses. A guide to 11 foundational AI concepts shaping enterprise adoption, from the integration gap and RAG reality to the agentic shift and feedback flywheel. The post highlights challenges like vendor lock-in, trust, and risk, while outlining how businesses can scale AI by embedding it natively and continuously reinventing processes.đŸ”” Meet dots.ocr: A New 1.7B Vision-Language Model that Achieves SOTA Performance on Multilingual Document Parsing. A new open-source vision-language model, dots.ocr (1.7B parameters), delivers state-of-the-art multilingual OCR and document parsing. Covering 100+ languages, it unifies layout detection and content recognition, preserves structure, and outputs JSON/Markdown/HTML. Benchmarks show it surpasses Gemini2.5-Pro in table accuracy and text precision, offering scalable, production-ready document analysis under the MIT license.đŸ”” Enhanced throttling observability in Amazon DynamoDB: Amazon DynamoDB introduces enhanced throttling observability, including structured exception messages with ThrottlingReasons, eight new CloudWatch metrics for detailed breakdowns, and a cost-efficient Contributor Insights mode that tracks only throttled keys. These features simplify diagnosing hot partitions, improving monitoring, and enabling faster mitigation of performance issues across tables and global secondary indexes.New Case Studies from the Tech Titans đŸš€đŸ’ĄđŸ”” Smarter Authoring, Better Code: How AI is Reshaping Google Cloud's Developer Experience. Google Cloud is using Gemini-powered AI to accelerate documentation and code sample workflows. New systems auto-generate, validate, and test quickstarts and API code samples, ensuring accuracy and freshness at scale. By combining agentic AI systems with human oversight, Google delivers faster, more reliable developer guidance across evolving cloud services.đŸ”” A Coding Guide to Build and Validate End-to-End Partitioned Data Pipelines in Dagster with Machine Learning Integration: A step-by-step guide to building an end-to-end partitioned data pipeline in Dagster, integrating raw data ingestion, cleaning, feature engineering, validation checks, and model training. Using a custom CSV-based IOManager, daily partitions, and lightweight regression, the tutorial shows how to create reproducible, modular pipelines with structured outputs and integrated machine learning.đŸ”” Google AI Introduces Gemma 3 270M: A Compact Model for Hyper-Efficient, Task-Specific Fine-Tuning. Google AI introduces Gemma 3 270M, a 270M-parameter model built for hyper-efficient fine-tuning and on-device AI. With a 256k vocabulary, INT4 quantization, and strong instruction-following out of the box, it enables privacy-preserving, domain-specific applications. Compact yet powerful, it delivers energy-efficient inference, rapid customization, and production-ready deployment across mobile, edge, and enterprise environments.đŸ”” “My biggest lesson was realizing that domain expertise matters more than algorithmic complexity.“ This blog is about a data scientist’s journey from corporate ML to independent AI consulting, reflecting on real-world lessons from competitions, the importance of domain expertise over algorithmic complexity, and a problem-first approach to AI adoption. It also covers mentoring advice, career path choices in data/AI, and emerging trends like text-to-speech for language preservation.Blog Pulse: What’s Moving Minds đŸ§ âœšđŸ”” Build a deep research agent with Google ADK: This guide shows how to build an agentic lead generation system using Google’s Agent Development Kit (ADK). By orchestrating cooperative agents for pattern discovery and lead generation, it demonstrates state management, parallel research, and dynamic validation, transforming brittle scripts into intelligent, scalable workflows that mimic a market research team.đŸ”” Hugging Face Unveils AI Sheets: A Free, Open-Source No-Code Toolkit for LLM-Powered Datasets. Hugging Face launches AI Sheets, a free, open-source, no-code tool that merges spreadsheets with LLM-powered data enrichment. Users can clean, transform, and generate datasets via prompts, using models like Qwen, Kimi, Llama 3, or custom local deployments. With built-in privacy, collaboration, and flexibility, it lowers barriers to AI-driven dataset creation.đŸ”” Building an MCP-Powered AI Agent with Gemini and mcp-agent Framework: A Step-by-Step Implementation Guide. A hands-on guide to building an MCP-powered AI agent with Gemini and the mcp-agent framework. The tutorial shows how to set up an MCP tool server, wire structured services like search, analysis, code execution, and weather, and integrate them with Gemini for asynchronous, extensible, and production-ready agent workflows.đŸ”” Model Predictive Control Basics: A step-by-step tutorial on Model Predictive Control (MPC) using Python and CasADi. It covers the fundamentals of MPC, formulates and solves an optimal control problem (OCP), and demonstrates implementation on a double integrator system. Includes full code, closed-loop simulations, and discussion of constraints, stability, and feasibility.See you next time! Rubrik * { -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } html, body { margin: 0; padding: 0; } body { margin: 0 auto !important; padding: 0; font-family: Arial, sans-serif; -webkit-text-size-adjust: 100% !important; -ms-text-size-adjust: 100% !important; -webkit-font-smoothing: antialiased !important; } .mktoText a, .mktoSnippet a, a:link, a:visited { color: #03AADD; text-decoration: none; } a[x-apple-data-detectors] { color: inherit !important; text-decoration: none !important; font-size: inherit !important; font-family: inherit !important; font-weight: inherit !important; line-height: inherit !important; } img { border: 0 !important; outline: none !important; max-width: 100%; } table { border-spacing: 0; mso-table-lspace: 0px; mso-table-rspace: 0px; } th { margin: 0; padding: 0; font-weight: normal; } div, td, a, span { mso-line-height-rule: exactly; } ul, ol { Margin-top: 0; Margin-bottom: 0; padding-left: 32px; } li { Margin-top: 0; Margin-bottom: 0; } [owa] .col, .col { display: table-cell !important; } .link-word-break a { word-break: break-all; } .link-normal a, .link-normal a:visited, .link-normal a:link { color: #03AADD; text-decoration: none; } .link-light a, .link-light a:visited, .link-light a:link { color: #FFFFFF; text-decoration: underline; } .flex-button-a { margin: 12px 10px; font-family: 'Inter', Arial, Helvetica, sans-serif; font-size: 14px; font-weight: bold !important; background-color: #0AC9BB; border: 0px solid #0AC9BB; border-radius: 4px; border-collapse: collapse; text-align: center; } .flex-button-a a, .flex-button-a a:visited, .flex-button-a a:link { padding: 12px 10px; display: block; text-align: center; color: #FFFFFF !important; text-decoration: none !important; } .flex-button-b { margin: 12px 10px; font-family: 'Inter', Arial, Helvetica, sans-serif; font-size: 14px; font-weight: bold !important; background-color: #F2B03B; border: 0px solid #F2B03B; border-radius: 4px; border-collapse: collapse; text-align: center; } .flex-button-b a, .flex-button-b a:visited, .flex-button-b a:link { padding: 12px 10px; display: block; text-align: center; color: #FFFFFF !important; text-decoration: none !important; } @media only screen and (max-width: 100%; float: none !important; } .mob-full { width: 100%; width: 100%; height: auto !important; } .img-full { width: 100%; max-width: 100%; height: auto !important; } .img-scale { width: 100%; height: auto !important; } .col { display: block !important; } .mob-text-center { text-align: center !important; } .mob-text-default {} .mob-align-center { margin: 0 auto !important; float: none !important; } .mob-align-default {} .mob-hide { display: none !important; visibility: hidden !important; } } @media yahoo { * { overflow: visible !important; } .y-overflow-hidden { overflow: hidden !important; } } div#emailPreHeader { display: none !important; } Unpack critical strategies to protect patient data and ensure operational continuity... When it comes to cyberattacks, prevention is no longer enough. You must assume breach - but that doesn't mean that you can't fight back. With the right strategies and technologies in place, you can maintain patient care even during cyber attacks. Building a resilient cyber strategy has never been more important. Join us virtually for Rubrik’s Healthcare Summit on September 10th to gain vital insights into the challenges facing organizations and the future of cybersecurity in healthcare. Leaders like John Riggi, the National Advisor for Cybersecurity and Risk, American Hospitals Association will cover critical topics including: Day Zero: Navigating the Aftermath: Immediate steps post-cyberattack, exploring new recovery approaches beyond traditional methods. From Crisis to Continuity with the Minimum Viable Hospital: Learn to define and rapidly restore core applications critical for patient care continuity. Rubrik for Healthcare: Discover modern cyber resilience capabilities, including automated ransomware recovery and rapid data restoration. We’ll unpack all this and more in the Healthcare Summit on September 10. Save your spot now. Save Your Spot Sponsored*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}} * { -webkit-font-smoothing: antialiased; -moz-osx-font-smoothing: grayscale; } html, body { margin: 0; padding: 0; } body { margin: 0 auto !important; padding: 0; font-family: Arial, sans-serif; -webkit-text-size-adjust: 100% !important; -ms-text-size-adjust: 100% !important; -webkit-font-smoothing: antialiased !important; } .mktoText a, .mktoSnippet a, a:link, a:visited { color: #03AADD; text-decoration: none; } a[x-apple-data-detectors] { color: inherit !important; text-decoration: none !important; font-size: inherit !important; font-family: inherit !important; font-weight: inherit !important; line-height: inherit !important; } img { border: 0 !important; outline: none !important; max-width: 100%; } table { border-spacing: 0; mso-table-lspace: 0px; mso-table-rspace: 0px; } th { margin: 0; padding: 0; font-weight: normal; } div, td, a, span { mso-line-height-rule: exactly; } ul, ol { Margin-top: 0; Margin-bottom: 0; padding-left: 32px; } li { Margin-top: 0; Margin-bottom: 0; } [owa] .col, .col { display: table-cell !important; } .link-word-break a { word-break: break-all; } .link-normal a, .link-normal a:visited, .link-normal a:link { color: #03AADD; text-decoration: none; } .link-light a, .link-light a:visited, .link-light a:link { color: #FFFFFF; text-decoration: underline; } .flex-button-a { margin: 12px 10px; font-family: 'Inter', Arial, Helvetica, sans-serif; font-size: 14px; font-weight: bold !important; background-color: #0AC9BB; border: 0px solid #0AC9BB; border-radius: 4px; border-collapse: collapse; text-align: center; } .flex-button-a a, .flex-button-a a:visited, .flex-button-a a:link { padding: 12px 10px; display: block; text-align: center; color: #FFFFFF !important; text-decoration: none !important; } .flex-button-b { margin: 12px 10px; font-family: 'Inter', Arial, Helvetica, sans-serif; font-size: 14px; font-weight: bold !important; background-color: #F2B03B; border: 0px solid #F2B03B; border-radius: 4px; border-collapse: collapse; text-align: center; } .flex-button-b a, .flex-button-b a:visited, .flex-button-b a:link { padding: 12px 10px; display: block; text-align: center; color: #FFFFFF !important; text-decoration: none !important; } @media only screen and (max-width: 100%; float: none !important; } .mob-full { width: 100%; width: 100%; height: auto !important; } .img-full { width: 100%; max-width: 100%; height: auto !important; } .img-scale { width: 100%; height: auto !important; } .col { display: block !important; } .mob-text-center { text-align: center !important; } .mob-text-default {} .mob-align-center { margin: 0 auto !important; float: none !important; } .mob-align-default {} .mob-hide { display: none !important; visibility: hidden !important; } } @media yahoo { * { overflow: visible !important; } .y-overflow-hidden { overflow: hidden !important; } } div#emailPreHeader { display: none !important; }
Read more
  • 0
  • 0
Merlyn from Packt
07 Aug 2025
13 min read
Save for later

AI First Colab Notebooks in BigQuery and Vertex AI, Gemini Code Assist in GitHub, OpenAI’s gpt-oss, Google DeepMind’s Genie 3

Merlyn from Packt
07 Aug 2025
13 min read
Anthropic’s Persona Vectors, MCP Security Survival Guide, InfiniBand vs RoCEv2Become an AI Generalist that makes $100K (in 16 hours)One of the biggest IT giants, TCS laid off 12,000 people this week. And this is just the beginning of the blood bath. In the coming days you’ll see not thousands, but millions of more layoffs & displacement of jobs. So what should you do right now to avoid getting affected? Invest your time in learning about AI. The tools, the use cases, the workflows – as much as you can.Join the World’s First 16-Hour LIVE AI Upskilling Sprint for professionals, founders, consultants & business owners like you. Register Now (Only 500 free seats)Date: Saturday and Sunday, 10 AM - 7 PM.Rated 4.9/5 by global learners – this will truly make you an AI Generalist that can build, solve & work on anything with AI.In just 16 hours & 5 sessions, you will:✅ Learn how AI really works by learning 10+ AI tools, LLM models and their practical use cases.✅ Learn to build and ship products faster, in days instead of months✅ Build AI Agents that handle your repetitive work and free up 20+ hours weekly✅ Create professional images and videos for your business, social media, and marketing campaigns.✅ Turn these AI skills into10$k income by consulting or starting your own AI services business.All by global experts from companies like Amazon, Microsoft, SamurAI and more. And it’s ALL. FOR. FREE. đŸ€Ż 🚀$5100+ worth of AI tools across 2 days — Day 1: 3000+ Prompt Bible, Day 2: Roadmap to make $10K/month with AI, additional bonus: Your Personal AI Toolkit Builder.Register Now (Only 500 free seats)SponsoredSubscribe|Submit a tip|Advertise with usWelcome to DataPro 144: Designing for IntelligenceThe data world is shifting fast, from dashboards and notebooks to agents that reason, write code, and navigate virtual worlds. In this issue, we look at what it means to design not just with AI, but for AI: platforms, workflows, and visualizations that collaborate, adapt, and inform with intelligence.We explore the tools reshaping how we build, the models pushing open boundaries, and the quiet craft of designing dashboards that speak clearly in a noisy world.🔍 Key Highlights This Issue:📓 AI-First Colab Notebooks: Google’s Data Science Agent in Colab Enterprise (BigQuery + Vertex AI) turns prompts into pipelines, coding, debugging, and visualizing in real-time.đŸ€– Gemini Code Assist: GitHub PRs meet Gemini 2.5, think code reviews with instant summaries, bug detection, and smart suggestions built-in.đŸ›Ąïž MCP Security Survival Guide: Why agentic systems like MCP demand new security thinking. A breakdown of real-world exploits and how to avoid them.🧠 Anthropic’s Persona Vectors: Mapping and moderating LLM behavior, new research shows how traits like sycophancy or hallucination can be tracked and controlled during training.🔌 InfiniBand vs. RoCEv2: A practical guide to choosing your AI network stack. Scale performance isn't just about GPUs, it’s how fast they talk to each other.📊 Tableau Dashboard Design: Not all dashboards are created equal. A deep dive into four design strategies, guided, exploratory, scorecard, narrative, from Learning Tableau 2025.đŸ§Ș Post-Processing Beats Modeling? Lessons from the Mostly AI synthetic data challenge, how smart sampling and refinement outperformed complex models.đŸ§© OpenAI’s gpt-oss Models: Open-weight LLMs that compete with proprietary ones. Reasoning, tool use, and safety, all on hardware you can actually run.🌍 Google DeepMind’s Genie 3: From video generation to real-time simulated worlds, Genie 3 makes AI environments interactive, consistent, and controllable.🌐 The Agentic Shift at Google Cloud: Not just tool, but agents, APIs, and foundations for a new AI-native enterprise. The data platform is becoming a thinking partner.As the boundaries between data, design, and intelligence blur, this is the moment to stay curious, stay critical, and explore what thoughtful, agentic systems can truly enable. Let’s build with intelligence, not just for it.Sponsored👉 Join Snyk’s Sonya Moisset on August 28 at 11:00AM ET to explore how to secure AI-powered development from code to deployment. Learn how to protect your SDLC, mitigate risks in vibe coding, and earn 1 CPE credit. Register today!👉 Webinar alert! Mobile experts from Bitrise and Embrace break down advanced CI/CD tips and real-user insights to help you speed up builds & deliver top-quality apps. Register here.Cheers,Merlyn ShelleyGrowth Lead, PacktThe Value of Thoughtful Dashboard Design in Tableau - by Ayushi BulaniIn the rush to build a new Tableau dashboard, it’s tempting to jump straight into charts and data. But taking a step back to define your dashboard’s purpose and strategy can make the difference between a report that confuses and one that doesn’t. Put simply, effective dashboards are rooted in clear objectives and an understanding of what your audience needs at a glance. (src)A common professional setting for Tableau users is the executives wanting quick insights without having to wade through noise, the analysts needing interactive exploration, and the broader audiences needing a narrative to make data relatable. A thoughtful dashboard design strategy aligns your Tableau visuals with these needs. (src) It ensures you’re not just throwing data on a page, but actually communicating the ideas. In the long run, a bit of planning on “dashboard strategy” saves time and elevates the impact of your work.Four approaches to dashboard designOne of the key insights from the upcoming book Learning Tableau 2025 is that there isn’t a one-size-fits-all approach to dashboard design. The book’s authors outline at least four common design approaches, each suited to different scenarios. Lightly adapted from Learning Tableau 2025, here are the four approaches and what they entail:đŸ”čGuided Analysis – This approach guides the audience through the data to facilitate discovery. In practice, you lead viewers step-by-step so they can understand the data’s implications and arrive at clear actions. A guided dashboard often anticipates a specific analysis path – you’ve done the analysis and now walk the user through those findings in a logical sequence.đŸ”čExploratory – An exploratory dashboard is an open sandbox. It provides tools (filters, drill-downs, etc.) for the audience to explore the data on their own. The idea is that the data’s story may evolve over time, so you empower users to investigate trends and relationships themselves. This approach is common in self-service BI scenarios, where different users might have different questions.đŸ”čScorecard / Status Snapshot – This is all about at-a-glance information. A scorecard or status snapshot delivers a concise summary of key performance indicators (KPIs) and metrics. It’s the classic executive dashboard: think of a one-page layout with big numbers, up/down arrows, and color-coded indicators. The goal is quick problem identification and monitoring – no heavy narrative, just the vital signs of the business in one view.đŸ”čNarrative – A narrative dashboard focuses on telling a story with the data. It guides the viewer through a beginning, middle, and end using visuals and text in a cohesive sequence. For example, you might show how a metric changed over time during a specific event (imagine illustrating the spread of a disease or the timeline of a marketing campaign). This approach adds context and commentary to data, making the insights memorable and compelling.(Extracted and adapted from Learning Tableau 2025 by Milligan et al.)Putting these approaches into practiceThese different approaches matter because of their impact. Matching your dashboard design to your audience’s needs can dramatically improve how your insights land. For instance, if your CEO just wants a daily health check of the business, a scorecard-style dashboard ensures they see all critical KPIs in seconds (and nothing more). If you’re presenting to stakeholders at a quarterly review, a narrative dashboard with a clear storyline might be more effective – it can walk them through performance drivers and outcomes in a logical flow. On the other hand, when you’re building tools for analysts or power users, an exploratory dashboard gives them the flexibility to ask their own questions about the data. And if you’ve conducted deep analysis yourself, a guided dashboard lets you package those insights into an interactive journey, so colleagues can essentially retrace your steps and findings.Keep in mind that these approaches aren’t mutually exclusive. Often, a well-crafted dashboard will blend elements of each. You might start with a snapshot overview up top (scorecard style), then provide interactive filters for deeper exploration, and perhaps include annotations or highlights to add a mini narrative. The key is to be deliberate: know when you’re trying to simply inform versus when you need to persuade or invite exploration. By aligning the design to the goal, you avoid the common pitfalls of cluttered or directionless dashboards.In today’s data-driven environment, dashboards are a staple of communication – and thoughtful design is what separates the mediocre from the truly effective. A bit of upfront strategy about how you present information pays off with dashboards that people actually use and understand. (src) Whether you’re guiding a user through a data story or letting them dive in themselves, choosing the right approach will ensure your Tableau work delivers value, not just charts.For those who want to dive deeper and see these principles in action, the book Learning Tableau 2025 is packed with practical examples and tips on building impactful dashboards. It’s a resource well worth exploring if you’re looking to sharpen your Tableau skills and design more thoughtful, effective dashboards. By approaching your next project with a clear strategy in mind, you’ll be well on your way to creating dashboards that not only look good, but drive smarter decisions in your organization.Want to design dashboards that communicate, not just display?Take the Tableau dashboard design quiz to find your weak point—and see how Learning Tableau 2025 can help you fix it. Take the quiz here!Then, pre-order your copy of Learning Tableau 2025 to learn how to apply guided analysis, exploratory tools, executive snapshots, and narrative techniques in real projects—so your dashboards deliver insight with impact.🛒 Pre-order here.⚡Latest Drops: Data, AI, and What’s NextđŸ”¶ AI First Colab Notebooks in BigQuery and Vertex AI: Colab Goes Agentic! Google’s new AI-first Colab Enterprise is more than a notebook, it’s your AI teammate. With agentic capabilities via the Data Science Agent, it plans, codes, debugs, visualizes, and iterates, all with human-in-the-loop control. Seamlessly integrated with BigQuery and Vertex AI, this signals Google’s bold move to make AI not just assistive, but collaborative in real data science workflows.đŸ”¶ Gemini Code Assist and GitHub AI code reviews: AI Code Reviews That Just Work. Gemini Code Assist turns pull requests into productivity boosters. Integrated into GitHub, it delivers instant PR summaries, flags bugs, and suggests improvements, all powered by Gemini 2.5. With contextual understanding, interactive feedback, and high trust suggestions, it’s more than automation, it’s collaboration. Teams like Delivery Hero are already seeing faster reviews, better code, and happier devs. Seems like the future of software quality is here, and it’s AI-reviewed.đŸ”¶ The MCP Security Survival Guide: Best Practices, Pitfalls, and Real-World Lessons: MCP Is Powerful. That’s Also Why It’s Dangerous.Agentic systems like MCP are revolutionizing AI workflows, but they’re also exposing critical security flaws. From OAuth mishaps to remote code exploits, real-world breaches show just how risky "plug-and-play" can be. Hailey Quach’s guide is an urgent call: use MCP, but use it wisely. This isn’t just best practice, it’s survival. A must-read for anyone building secure, agentic AI infrastructure.Source: TowardsDataScienceđŸ”¶ Anthropic’s Persona Vectors: Monitoring and controlling character traits in language models. Why Your LLM Might Start Flattering You, or Worse. Anthropic’s new research on persona vectors reveals a breakthrough in tracking and controlling AI “personalities.” By isolating neural patterns tied to traits like sycophancy, hallucination, or even evil, developers can now monitor personality drift, prevent unwanted behavior during training, and flag risky datasets, without degrading performance. If AI character control is the next frontier, persona vectors might be our steering wheel.đŸ”¶ InfiniBand vs RoCEv2: Choosing the Right Network for Large-Scale AI. Choosing the Fast Lane for AI Scale. Training massive AI models isn’t just about powerful GPUs, it’s about how fast they talk. This guide breaks down InfiniBand vs RoCEv2, the two dominant network stacks powering GPU-to-GPU communication. InfiniBand offers unrivaled speed but at a premium. RoCEv2 rides Ethernet’s rails with careful tuning. If you’re building for scale, your network isn’t infrastructure, it’s a performance multiplier. Choose wisely.đŸ”¶ How I Won the “Mostly AI” Synthetic Data Challenge? Post-Processing for Synthetic Data Accuracy. A recent synthetic data competition highlighted the power of post-processing over model complexity. By oversampling, trimming, and iteratively refining generated data, one solution significantly improved distributional accuracy and sequence coherence. Techniques like IPF and group-level swapping outperformed ensemble modeling. The results suggest that aligning generation strategies with evaluation metrics, rather than relying solely on generative models, can be a more effective path to high-quality synthetic datasets.đŸ”¶ Introducing gpt-oss: OpenAI’s Step Toward Transparent AI: Open-Weight Models Are Growing Up. OpenAI’s release of gpt-oss-120b and gpt-oss-20b brings open-weight models closer to proprietary performance on reasoning and tool use tasks. Trained with techniques from internal frontier models, both models offer strong results across benchmarks like MMLU and HealthBench. With full customizability, modest hardware requirements, and a safety evaluation pipeline, gpt-oss models provide a flexible option for developers working on local inference, alignment research, or agentic workflows.đŸ”¶ Google DeepMind’s Genie 3: A new frontier for world models:Simulated Worlds Are Becoming Playable. Genie 3 pushes world models from static simulation to real-time interaction. Unlike earlier video generation models, it enables consistent, navigable environments at 24 FPS, complete with memory, interactivity, and controllable events. This represents a step toward open-ended training environments for agents, but also opens up new questions around scalability, fidelity, and alignment as these systems move from outputting video to becoming the world itself.đŸ”¶ New agents and AI foundations for data teams: Data Platforms Are Becoming Cognitive Partners. Google’s latest update positions the Data Cloud as more than infrastructure, it’s the operating system for agentic AI. With specialized data agents, unified transactional-analytical memory, and built-in reasoning, the traditional data stack is giving way to autonomous, collaborative intelligence. The shift isn’t just technical, it redefines how data work gets done, embedding agency and adaptability directly into the platforms that power decision-making at scale.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Merlyn from Packt
24 Jul 2025
11 min read
Save for later

Amazon’s Mitra – Tabular Foundation Model, Qwen3-Coder-480B-A35B-Instruct, NVIDIA’s Cosmos DiffusionRenderer, DeepSeek R1 on Vertex AI

Merlyn from Packt
24 Jul 2025
11 min read
Torchvista, AWS Data Processing MCP Server, Amazon Q + DLC MCP, Streamlit + MCP, ChatGPT AgentBecome an AI Generalist that makes $100K (in 16 hours)Still don’t use AI to automate your work & make big $$? You’re way behind in the AI race. But worry not:Join the World’s First 16-Hour LIVE AI Upskilling Sprint for professionals, founders, consultants & business owners like you. Register Now (Only 500 free seats)Date: Saturday and Sunday, 10 AM - 7 PM.Rated 4.9/10 by global learners – this will truly make you an AI Generalist that can build, solve & work on anything with AI.In just 16 hours & 5 sessions, you will:✅ Learn the basics of LLMs and how they work.✅ Master prompt engineering for precise AI outputs.✅ Build custom GPT bots and AI agents that save you 20+ hours weekly.✅ Create high-quality images and videos for content, marketing, and branding.✅ Automate tasks and turn your AI skills into a profitable career or business.All by global experts from companies like Amazon, Microsoft, SamurAI and more. And it’s ALL. FOR. FREE. đŸ€Ż 🚀$5100+ worth of AI tools across 2 days — Day 1: 3000+ Prompt Bible, Day 2: Roadmap to make $10K/month with AI, additional bonus: Your Personal AI Toolkit Builder.Register Now (Only 500 free seats)SponsoredSubscribe|Submit a tip|Advertise with usWelcome to DataPro #143: From Bits to Brains - The Tools Driving the Next Wave of Intelligent Systems 🧠📡What if your database could talk back with charts, or your containers built themselves when you spoke? What if your AI agent could say “I don’t know” and actually mean it?This week, we dive into a new breed of tools designed not just to build smarter systems, but to understand, reason, and scale them. These aren’t just marginal upgrades, they’re foundational shifts in how we build and interact with AI.Start with Mitra: Amazon’s tabular foundation model that ditches real-world data for synthetic priors (think causal graphs + tree ensembles) and still manages SOTA across tabular benchmarks via in-context learning.Then check out Qwen3-Coder-480B-A35B-Instruct, a Claude-class code model with 256K native context and 1M with Yarn, engineered for repository-scale agentic reasoning.Want BI that speaks SQL and your language? Wren AI is your GenBI agent, natural language in, SQL and insights out, thanks to a semantic layer, LLM integrations, and plug-and-play APIs.Visual domains aren’t left out. Cosmos DiffusionRenderer from NVIDIA reinvents video re-lighting with neural inverse rendering, 70GB models, and GPU-optimized pipelines for stunning realism.If you’re building with agents, 7 MCP Best Practices are a must-read, from schema validation to Dockerized deployments to performance tuning at scale.Meanwhile, ChatGPT Agent blurs the line between reasoning and doing, browsing, coding, and summarizing, all on its own virtual machine.But let’s not forget the human side. How Not to Mislead with Your Data is a masterclass on spotting narrative bias in data storytelling, and the ethical stakes behind our charts.And yes, Cloud SQL meets Vertex AI now means vector search and Gemini are just SQL calls away. You can embed, search, and analyze, all inside your relational DB.In the wild, Streamlit + MCP brings it all together in a sleek client interface that lets users query DeepWiki or HuggingFace-backed agents via natural language, no frontend dev required.AWS Data Processing MCP Server takes that to an enterprise level, streamlining schema discovery, query generation, and job monitoring across Glue, Athena, and EMR, all via natural language.Then, go deep with Amazon Q + DLC MCP: a system that automates PyTorch/TensorFlow container orchestration with a single prompt. Think: “Deploy PyTorch for multi-node training”, and it just happens.Finally, DeepSeek R1 on Vertex AI means no GPUs needed, just an API call. Run it on-demand, serverless, pay-as-you-go, no infrastructure stress.Still thinking of attention heads asdot products? Transformers as Addition Machines reframes attention with mechanistic interpretation, revealing layer-by-layer logic circuits.Or maybe you prefer pictures, Torchvista lets you trace PyTorch forward passes as interactive graphs inside your notebook, a dream for debugging or demystifying hidden layers.Semantic communication is making machines communicate with meaning, not bits. It’s the end of false alarms and overfitting to known categories, and it's all because of the knowledge graphs that reason over context and uncertainty.And if you’re ready to start building today, Google Cloud’s top 25 guides are a treasure trove: from RAG, RLHF, and agent orchestration to CI/CD pipelines and multi-agent chat apps, code included, no excuses.We’re in the midst of a shift: From models that classify to systems that reason. From dashboards to agents. From pixels to meaning.This issue is your map. Dive in, experiment, build.Sponsored: Your data, built your way with Twilio Segment — a customer data platform designed to cut through the chaos, unify your stack, and free you to focus on innovation over integration. Learn more.Cheers,Merlyn ShelleyGrowth Lead, PacktTop Tools Driving New Research đŸ”§đŸ“Šâ© Mitra: Mixed synthetic priors for enhancing tabular foundation models. Amazon’s Mitra is a tabular foundation model (TFM) that uses in-context learning to generalize across tabular tasks without retraining. Pretrained on synthetic data from causal models and tree-based methods, rather than real-world data, Mitra achieves state-of-the-art results across benchmarks like TabRepo and TabArena. It’s open source via AutoGluon 1.4.⏩ Qwen/Qwen3-Coder-480B-A35B-Instruct · Qwen3-Coder-480B-A35B-Instruct is Qwen’s most advanced code model, delivering Claude Sonnet-level performance on agentic coding and browser-use tasks. It supports 256K token context (extendable to 1M), tool calling, and repository-scale understanding. Built with 480B parameters (35B active), it uses in-context prompting and excels at function-call reasoning, agent frameworks, and long-horizon completions.⏩ Wren AI is your GenBI Agent: Wren AI is a GenBI agent that lets you query databases in natural language to generate SQL, charts, and AI-driven insights instantly. It features a semantic layer for governed accuracy, integrates with top LLMs, supports embedding via API, and connects to major data sources. Fast setup, cloud and open-source options included.⏩ nv-tlabs/cosmos1-diffusion-renderer: Cosmos DiffusionRenderer is NVIDIA’s latest video diffusion framework for high-quality image and video de-lighting and re-lighting. Built on DiffusionRenderer and powered by Cosmos, it features neural inverse and forward rendering with significant improvements in realism and control. It supports GPU-efficient inference, 70GB models, and full relighting pipelines for both static images and dynamic videos.Topics Catching Fire in Data Circles đŸ”„đŸ’Źâ© 7 MCP Server Best Practices for Scalable AI Integrations in 2025: Model Context Protocol (MCP) servers are becoming essential for secure, scalable, and agentic AI integrations. This guide outlines 7 best practices, toolset design, proactive security, schema validation, local/remote testing, Docker packaging, performance tuning, and documentation, that reduce errors, boost developer adoption, and power industry-wide AI success across finance, healthcare, e-commerce, and more.⏩ ChatGPT Agent: Bridging Research and Action: ChatGPT Agent introduces a powerful leap in agentic AI: it can now think and act on your behalf using its own virtual computer, navigating websites, running code, analyzing data, and producing editable outputs like slides and spreadsheets. It integrates browsing, terminals, APIs, and tool access to complete complex real-world tasks autonomously.⏩ How Not to Mislead with Your Data-Driven Story? Data storytelling helps us understand the world, but it can also mislead. This piece explores how persuasive narratives, even with accurate data, can distort truth. It highlights narrative bias risks like selection, framing, and interpretation, and urges data professionals to balance emotional storytelling with clarity, ethics, and rigorous data literacy.⏩ Integrate your Cloud SQL for MySQL instance with Vertex AI and vector search: Google Cloud’s Cloud SQL for MySQL now supports vector embeddings and Vertex AI integration, empowering developers to run AI-powered search and analysis directly in SQL. You can generate, store, and search vector embeddings with native SQL functions, perform ANN search, and invoke Gemini or custom Vertex AI models to assess customer sentiment or predict behavior, all within your database.New Case Studies from the Tech Titans đŸš€đŸ’Ąâ© MCP Client Development with Streamlit: Build Your AI-Powered Web App. This tutorial walks you through building a Streamlit-based MCP client interface that connects to remote MCP servers like DeepWiki and HuggingFace. The client lets users input topics and receive AI-generated summaries or recommendations via OpenAI’s API. It covers setup, secure key handling, MCP tool integration, and UI design, enabling rapid, modular deployment of AI-powered web tools.⏩ Accelerating development with the AWS Data Processing MCP Server and Agent: The AWS Data Processing MCP Server simplifies complex analytics workflows by enabling AI-driven natural language interactions with services like AWS Glue, Athena, and EMR. Built on the Model Context Protocol (MCP), it abstracts multi-service orchestration, automating tasks like schema discovery, query generation, reporting, and monitoring. Developers can integrate it via Amazon Q CLI or Claude Desktop to streamline onboarding, accelerate insight generation, and enhance observability.⏩ Streamline deep learning environments with Amazon Q Developer and MCP: Amazon Q + the DLC MCP Server radically simplifies how AI/ML teams manage Deep Learning Containers. Instead of manually customizing, testing, and deploying DLCs for PyTorch or TensorFlow, developers can now use natural language via Amazon Q CLI to automate everything, from image selection to ECR deployment, distributed training, and environment troubleshooting. It turns container operations into secure, conversational workflows.⏩ Deepseek R1 is available for everyone in Vertex AI Model Garden: DeepSeek R1 is now available on Vertex AI’s Model-as-a-Service (MaaS) platform, enabling businesses to access this powerful open model without managing GPU infrastructure. With just a few clicks or API calls, teams can test and deploy DeepSeek via a serverless, pay-as-you-go model. Vertex AI handles security, scalability, and compliance, accelerating AI innovation with zero infrastructure overhead.Blog Pulse: What’s Moving Minds đŸ§ âœšâ© Transformers (and Attention) are Just Fancy Addition Machines: Mechanistic interpretation is a novel AI interpretability approach that goes beyond tools like SHAP and LIME by uncovering how neural networks compute, not just what features influence outputs. It traces how features are encoded and transformed across layers, especially in transformers. By reimagining multi-head attention as additive rather than concatenative, it enables circuit-level analysis of neuron behavior. This method reveals the internal logic of models, opening doors to deeper understanding, debugging, and trust in complex AI systems.⏩ Torchvista: Building an Interactive Pytorch Visualization Package for Notebooks. Torchvista is an open-source tool for interactively visualizing the forward pass of PyTorch models inside web-based notebooks like Colab or Jupyter. Unlike static tools, it offers zoomable, modular graph views, supports error-tolerant partial visualizations, and requires just a one-line trace_model() call. It traces tensor flows and module hierarchies during forward execution and renders them as interactive, nested graphs using JS libraries like D3 and Graphviz, making complex models understandable, debuggable, and more accessible for iterative development and exploration.⏩ From Rules to Relationships: How Machines Are Learning to Understand EachOther? Semantic communication shifts focus from transmitting raw bits to conveying meaning, crucial in modern, machine-heavy networks. Traditional SKB systems compress messages via fixed categories, but fail in unfamiliar scenarios. Knowledge graph-based semantic communication fixes this by modeling relationships between entities, enabling contextual reasoning. This allows systems to intelligently handle edge cases (e.g., maintenance workers during off-hours) by inferring intent and suggesting verification over false alarms. Though graph systems require more compute and expertise, they vastly improve real-world accuracy, adaptability, and decision-making in noisy, dynamic environments.⏩ 25 top how-to guides for Google Cloud: The best way to learn AI is to build it, and Google Cloud now offers a curated collection of 25+ hands-on how-to guides to help you do just that. From deploying large models like Llama 3 and DeepSeek on high-performance infrastructure, to creating advanced gen AI apps, fine-tuning with RAG and RLHF, and integrating agents with real-world systems, this living resource accelerates your AI journey. Each guide includes code, tools, and best practices, ready to help you build smarter, faster, and at scale.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 1
  • 0

Merlyn from Packt
16 Jul 2025
6 min read
Save for later

Amazon EKS now scales to 100K nodes, AutoKeras/Keras Tuner, Streamlit apps to AWS, Strands Agents 1.0

Merlyn from Packt
16 Jul 2025
6 min read
NVIDIA’s Audio Flamingo 3, GoogleSQL’s new pipe syntax, MetaStone-S1, Fractional ReasoningAn Exclusive Look into Next Gen BI – Live WebinarDashboards alone aren’t cutting it. The market’s moving toward something new: data apps, live collaboration, and AI that works the way teams actually work.See what's driving the rise of Next Gen BI, how Sigma earned a top debut on the Gartner Magic Quadrant, and what’s next for our roadmap.Secure Your SpotSponsoredSubscribe|Submit a tip|Advertise with usWelcome to DataPro 142: Tools Driving Tomorrow’s Thinking 🔬📈In this edition, we spotlight the breakthrough tools, patterns, and practices that are reshaping research and production in AI and data science.From NVIDIA’s Audio Flamingo 3 pushing the frontier of multimodal reasoning, to Fractional Reasoning’s elegant solution to adaptive LLM compute, and MetaStone-S1’s bold performance claims, this week’s releases are not just incremental; they’re foundational. Meanwhile, Kiro is redefining the dev experience, merging agentic coding with production-readiness from day one.On the systems front, Amazon EKS now scales to 100K nodes, opening the door to AGI-class workloads. And GoogleSQL’s new pipe syntax is winning hearts in the SQL community for its clarity and composability. If you’ve ever loathed nested subqueries, this is your moment.For those making decisions about tooling, don’t miss our link on Foundation vs. Custom Models, a smart, grounded guide for teams navigating performance vs. control. Also featured: Amazon SageMaker’s new unified catalog, practical AutoML with AutoKeras/Keras Tuner, and a no-fuss walkthrough of deploying Streamlit apps to AWS.Lastly, we dive into deeper reflections: Strands Agents 1.0 brings multi-agent orchestration into the real world, and standout articles explore paradox pitfalls in metrics, and how data’s 40-year evolution is shaping AI’s next wave.Let’s get into it. âŹ‡ïžCheers,Merlyn ShelleyGrowth Lead, PacktTop Tools Driving New Research đŸ”§đŸ“ŠđŸ”” nvidia/audio-flamingo-3 · Audio Flamingo 3 (AF3) is an open Large Audio-Language Model (LALM) by NVIDIA for research use, capable of reasoning across speech, sound, and music. It supports long audio inputs, multi-turn voice dialogue, and chain-of-thought reasoning, achieving state-of-the-art results on 20+ tasks through unified audio representation and extensive dataset training.đŸ”” Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute: Fractional Reasoning introduces a model-agnostic, training-free method to dynamically adjust LLM reasoning depth at inference. By scaling latent steering vectors, it tailors compute per input complexity, boosting accuracy and efficiency. Compatible with Best-of-N, majority vote, and self-reflection, it outperforms fixed prompts across GSM8K, MATH500, and GPQA benchmarks.đŸ”” MetaStone-AI/MetaStone-S1: MetaStone-S1 is a 32B-parameter reflective generative model that rivals OpenAI-o3-mini on math, code, and Chinese reasoning. It combines Long-CoT Reinforcement and Process Reward Learning for efficient, high-quality inference. MetaStone-S1 achieves deep reasoning while reducing policy model costs by 99%, enabling fast, accurate outputs across multiple benchmarks.đŸ”” Introducing Kiro: Kiro is an agentic IDE that turns AI prototypes into production-grade apps using spec-driven development. It auto-generates requirements, design docs, and implementation tasks, and uses hooks for event-based automation. With built-in test coverage, design clarity, and consistency checks, Kiro helps developers ship reliable software faster and with greater confidence.Topics Catching Fire in Data Circles đŸ”„đŸ’ŹđŸ”” Do You Really Need a Foundation Model? Not every use case needs a foundation model. This guide compares foundation and custom models across performance, cost, latency, and control. It offers a decision framework, practical examples, and hybrid strategies to help teams choose the right approach, balancing rapid prototyping with long-term scalability, privacy needs, and task-specific optimization.đŸ”” Automating Deep Learning: A Gentle Introduction to AutoKeras and Keras Tuner. This guide introduces AutoKeras and Keras Tuner, two AutoML tools that simplify deep learning. AutoKeras automates architecture and training, while Keras Tuner optimizes hyperparameters of custom models. Together, they streamline experimentation, reduce guesswork, and boost performance, ideal for tasks like image classification, tabular modeling, or rapid prototyping with minimal manual tuning.đŸ”” Amazon EKS enables ultra scale AI/ML workloads with support for 100K nodes per cluster: Amazon EKS now supports up to 100,000 nodes per cluster, enabling ultra-scale AI/ML workloads with 1.6M Trainium or 800K GPU instances. This breakthrough powers large model training, reduces operational costs, and preserves Kubernetes compatibility, paving the way for AGI-scale innovation through enhanced orchestration, resiliency, and open-source flexibility.đŸ”” Exploring pipe syntax real-world use cases: GoogleSQL's pipe syntax reimagines SQL with a linear, readable data flow using the |> operator. It simplifies complex queries, streamlines data pipelines, and improves log analysis clarity. By eliminating nested structures and enabling intuitive chaining, pipe syntax boosts productivity, maintainability, and accelerates insight generation across BigQuery and Cloud Logging workflows.New Case Studies from the Tech Titans đŸš€đŸ’ĄđŸ”” How Metrics (and LLMs) Can Trick You: A Field Guide to Paradoxes. This article unpacks how paradoxes like Simpson’s, the Accuracy Paradox, and Goodhart’s Law mislead both data science and LLM evaluation. It shows how surface-level metrics can distort truth, urging practitioners to embrace contextual, nuanced measurement, especially in BI and Retrieval-Augmented Generation, where incentives, imbalance, and aggregation errors can derail decision-making.đŸ”” What Can the History of Data Tell Us About the Future of AI? This sweeping 40-year history of data explores how shifts in storage, architecture, and business models have shaped intelligent systems. By tracing personal, public, and enterprise data, from PCs to cloud to AI, the piece reveals how incentives, infrastructure, and data ownership will determine the trajectory of AI’s future.đŸ”” Streamline the path from data to insights with new Amazon SageMaker Catalog capabilities: Amazon SageMaker now streamlines analytics with new integrations: QuickSight for in-studio dashboarding, S3 Access Grants for secure unstructured data sharing, and automatic onboarding of Glue Data Catalog datasets. These updates unify structured and unstructured data, accelerating workflows from raw data to insights, governed, discoverable, and ready for ML and BI use.Blog Pulse: What’s Moving Minds đŸ§ âœšđŸ”” Deploy a Streamlit App to AWS: This hands-on guide walks you through deploying a Streamlit app on AWS using Elastic Beanstalk. It covers preparing your code, switching from Postgres to S3 for data, configuring AWS infrastructure, and managing deployment steps. Ideal for developers needing scalable, secure alternatives to public cloud endpoints like Streamlit Community Cloud.đŸ”” Accuracy Is Dead: Calibration, Discrimination, and Other Metrics You Actually Need. This guide challenges accuracy as a primary evaluation metric, urging data scientists to adopt deeper, problem-specific tools. It explores advanced classification metrics like ROC-AUC, log loss, and Brier score, and regression metrics like RÂČ, RMSLE, and quantile loss, emphasizing calibration, uncertainty, and decision-readiness over surface-level model performance.đŸ”” Introducing Strands Agents 1.0: Production-Ready Multi-Agent Orchestration Made Simple: Strands Agents 1.0 is a production-ready SDK for building multi-agent AI systems. It introduces primitives like Agents-as-Tools, Swarms, Graphs, and A2A support for inter-agent communication. With session persistence, async performance, and flexible model integration, Strands simplifies orchestration, scaling from prototype to production for complex, collaborative, and distributed agentic workflows.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
Merlyn from Packt
09 Oct 2025
5 min read
Save for later

What’s new in Airflow 3.1 – HITL, expanded plugins, and more

Merlyn from Packt
09 Oct 2025
5 min read
Join Airflow experts on Oct 22 for a live tour of 3.1’s biggest updates.Join Airflow experts on October 22 for a live walkthrough👋 Hello ,Your Airflow workflows just got smarter. Airflow 3.0 was the biggest release in the project’s history. Now, Airflow 3.1 introduces new features built for today’s challenges, from orchestrating GenAI pipelines that need human intervention to customizing your UI for non-technical stakeholders.Join Airflow experts on October 22 for a live walkthrough of all the updates you need to know:Human-in-the-loop enables mid-run workflow intervention including branch selection, task output approval/rejection, and text input for downstream tasksReact-based plugin interface extends the plugin system with support for React apps, external views, dashboard integrations, and menu integrationsUI improvements like favorited DAGs, internationalization, restored gantt charts, and performance insightsRegister NowWhether you’re already running 3.0 or waiting to see what's worth upgrading for, this webinar will get you up to speed on all of the latest changes and allow you to get your Airflow 3 questions answered live.Can’t join live? Register anyway and we’ll send you the recording.Sign Up for the RecordingSponsoredSubscribe|Submit a tip|Advertise with UsWelcome to DataPro Expert Insights #153!In this week’s Expert Edition, we’re excited to feature Eric Narro, Analytics Engineer and author of Getting Started with Taipy. Eric introduces a fresh perspective on how to move from time series to chatbots and bring your Python models to life with Taipy.For those new to it, Taipy is a Python application builder with one clear promise, to help you deploy data applications in real production environments. It’s the ideal tool for creating scalable, interactive apps that turn your models, analytics, and algorithms into powerful end-user experiences. Whether you’re building dashboards, optimization tools, or AI-powered chatbots, Taipy empowers data professionals to go from prototype to production with confidence.Let’s dive in!Cheers,Merlyn ShelleyGrowth Lead, PacktMeet Taipy: A Pure-Python, Fast, and Scalable Application BuilderBy Eric NarroFrom Time Series to Chatbots: Bring your Python Models to Life with TaipyTaipy is a Python application builder with one clear promise:deploy your data applications in real production environments. It’s the ideal tool for creating scalable, interactive apps that bring your models, analytics, and algorithms to life. Whether you’re building dashboards, optimization tools, or AI-powered chatbots, Taipy helps data professionals turn prototypes into powerful, end-user applications. WithGetting Started with Taipy, you’ll learn how to build complete applications from the ground up, deploy them confidently, and explore real-world examples and advanced use cases that showcase Taipy’s full potential.Python has long been the go-to language for data professionals, not because they’re developers, butbecause Python makes complex work accessible.Analysts, data scientists, and AI engineers use it to model data, run analytics, and visualize results.But when it comes to turning those models into real applications for end users, things get tricky. Building a web app the traditional way, with backend frameworks, databases, and front-end stacks, is often out of reach for data teams. It demands skills, time, and coordination that slow everything down and increase costs.Tools like Power BI or Tableau help visualize data, but they can’t trulyrunPython code or offer the flexibility of a full application. Python frameworks like Streamlit, Dash, Panel, or Gradio solve the problem partially. Each has trade-offs. To give an example, Streamlit is a great library for prototyping: it’s very easy to learn, and you can create demos in no time. While you can take Streamlit applications to production, they are harder to scale because they don’t optimize the way code runs, and they run on their own server (you can’t run them in a WSGI server). What this means is you can create useful applications for end users if they make limited use of the app, or if you don’t need to process large amounts of data.That’s where Taipy comes in!Taipy lets you create scalable, production-grade applications directly in Python.Whether for time series, optimization, geospatial analysis, or even LLM chatbots, Taipy is designed for performance and scalability.You can deploy Taipy apps on WSGI servers, handle multiple users efficiently, and still build everything using pure Python.Continue reading the full article on our Packt Medium Handle here.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0

Merlyn from Packt
09 Jul 2025
11 min read
Save for later

SmolLM3, Hugging Face’s small-but-mighty multilingual model with 128k-token context, MLarena, a diagnostic-rich, algorithm-agnostic toolkit

Merlyn from Packt
09 Jul 2025
11 min read
Microsoft’s Copilot Chat goes open-source, Beyond Prompts: The Rise of Context EngineeringTogether with Growth School & Infinite UptimeJoin this 16 hour AI Learning Sprint to become an AI Genius (worth $895 but $0 today)The AI race is getting faster & dirtier day by day. Things we could never have imagined are happening.--Thousands of people are getting laid off everyday--People are building 1-person million dollar companies--Tech giants are fighting for AI talentMeta just poached OpenAI’s 4 top researchers 

.So if you’re not learning AI today, you probably won't have a job in the next 6 months.That’s why, you need to join the 3-Day Free AI Mastermind by Outskill which comes with 16 hours of intensive training on AI frameworks, building with sessions, creating images and videos etc. that will make you an AI expert. Originally priced at $895, but the first 100 of you get in for completely FREE! Extended 4th of july SALE! 🎁📅FRI-SAT-SUN- Kick Off Call & Live Sessions🕜10AM EST to 7PM EST✅ trusted by 4M+ learnersIn the 5 sessions, you will:✅ Master prompt engineering to get the best out of AI.✅ Build custom GPT bots & AI agents for email management to save you 20+ hours weekly.✅ Create high-quality images and videos for PPTs, marketing, and branding.✅ Monetise your AI skills into a $10,000/mo business.All by global experts from companies like Amazon, Microsoft, SamurAI and more. And it’s ALL. FOR. FREE. đŸ€Ż 🚀Join now and get $5100+ in additional bonuses$5100+ worth of AI tools across 3 days — Day 1: 3000+ Prompt Bible, Day 2: Roadmap to make $10K/month with AI, Day 3: Your Personal AI Toolkit Builder.SponsoredSubscribe|Submit a tip|Advertise with usWelcome to DataPro #141 ~ Engineering Intelligence, Not Just ModelsIn this landmark edition, we go beyond algorithms and hyperparameters to explore how data science is evolving into a discipline of system design, orchestration, and reasoning. As GenAI shifts the boundaries of what’s possible, the conversation is no longer about what model to use, but how we structure intelligence itself.Our feature deep dive, “Beyond Prompts: The Rise of Context Engineering” byRahul Singh, Data Science Manager at Adobe,challenges the prompt-centric mindset and introduces Context Engineering as a foundational pillar for building scalable, intelligent agents. If you’re architecting the future of enterprise AI, this is essential reading.Also inside:– Build a fully autonomous multi-agent system with Python, OpenAI API, and PrimisAI Nexus– Explore SmolLM3, Hugging Face’s small-but-mighty multilingual model with 128k-token context– Microsoft’s Copilot Chat goes open-source, offering powerful AI pair programming to everyone– Google’s MCP Toolbox simplifies secure, schema-aware database access for AI agents– A technical teardown of Shazam’s algorithmic magic, from FFT to hash matching– How POSETs in Python provide better multi-criteria decisions than rankings– Launch smarter ML pipelines with MLarena, a diagnostic-rich, algorithm-agnostic toolkit– Unlock true concurrency with free-threaded Python 3.13 and StaticFrame for blazing-fast row opsWhether you're scaling models, building infrastructure, or shaping AI policy, this issue delivers insights for every data scientist at the frontier.✉ Have tips or tools to share? Reply and contribute to our next edition.Cheers,Merlyn ShelleyGrowth Lead, PacktUnlock 99.97% Availability with PlantOS: Production Reliability, RedefinedPlantOS Manufacturing Intelligence is powering the next era of industrial performance — delivering 99.97% equipment availability and up to 2% energy savings per unit produced. From steel to cement, manufacturers worldwide are turning fragmented data into confident decisions across every layer of production — from parameter to plant to global scale.Experience Infinite Uptime NowSponsoredBeyond Prompts: The Rise of Context EngineeringWhy context engineering is the next frontier in building smarter, more reliable AI systems.Written by Rahul Singh, Data Science Manager @Adobe. Over my seven-plus-year career in data science, working on projects ranging from customer-value measurement to product analytics and personalization, one question has remained constant through it all:Do we have the right data, and can we trust it?With the rapid rise of Generative AI, that question hasn’t disappeared; it’s become even more urgent. As AI systems evolve from proof-of-concept assistive chatbots to autonomous agents capable of reasoning and acting, their success increasingly depends not on how complex or powerful they are, but on how well they understand the context in which they operate.In recent weeks, leaders like Tobi LĂŒtke (CEO of Shopify), Andrej Karpathy (former Director of AI at Tesla), and others have spotlighted this shift. LĂŒtke’s tweet was widely reshared, including by Karpathy, who elaborated on it further. He emphasized that context engineering is not about simple prompting, but about carefully curating, compressing, and sequencing the right mix of task instructions, examples, data, tools, and system states to guide intelligent behavior. This emerging discipline, still poorly understood in most organizations, is quickly becoming foundational to any serious application of generative AI.This growing attention tocontext engineeringsignals a broader shift underway in the AI landscape. For much of the past year,prompt engineeringdominated the conversation, shaping new job titles and driving a surge in hiring interest. But that momentum is tapering. A Microsoft survey across 31 countries recently ranked “Prompt Engineer” near the bottom of roles companies plan to hire(Source).Job search trends reflect the change as well: according to Indeed, prompt-related job searches have dropped from144 per milliontojust 20–30(Source).But this decline doesn’t signal the death of prompt engineering by any means. Instead, it reflects a field in transition. As use cases evolve from assistive to agentic AI, ones that can plan, reason, and act autonomously, the core challenge is no longer just about phrasing a good prompt. It’s about whether the model has the right information, at the right time, to reason and take meaningful action.This is where Context Engineering comes in!Suppose prompt engineering is about writing the recipe, carefully phrased, logically structured, and goal-directed. In that case,context engineeringis about stocking the pantry, prepping the key ingredients, and ensuring the model remembers what’s already been cooked. It’s the discipline of designing systems that feed the model relevant data, documentation, code, policies, and prior knowledge, not just once, but continuously and reliably.In enterprises, where critical knowledge is often proprietary and fragmented across various platforms, including SharePoint folders, Jira tickets, Wiki pages, Slack threads, Git Repositories, emails, and dozens of internal tools, the bottleneck for driving impact with AI is rarely the prompt. It’s the missing ingredients from the pantry, the right data, delivered at the right moment, in the right format. Even the most carefully crafted prompt will fall flat if the model lacks access to the organizational context that makes the request meaningful, relevant, and actionable.And as today’s LLMs evolve intoLarge Reasoning Models(LRM), and agentic systems begin performing real, business-critical tasks, context becomes the core differentiator. Models like OpenAI’s o3 and Anthropic’s Claude Opus 4 can handle hundreds of thousands of tokens in one go. But sheer capacity is not enough to guarantee success. What matters is selectively injecting the right slices of enterprise knowledge: source code, data schemas, metrics, KPIs, compliance rules, naming conventions, internal policies, and more.This orchestration of context is not just document retrieval; it’s evolving into a new systems layer. Instead of simply fetching files, these systems now organize and deliver the right information at the right step, sequencing knowledge, tracking intermediate decisions, and managing memory across interactions. In more advanced setups, supporting models handle planning, summarization, or memory compression behind the scenes, helping the primary model stay focused and efficient. These architectural shifts are making it possible for AI systems to reason more effectively over time and across tasks.Without this context layer, even the best models stall on incomplete or siloed inputs. With it, they can reason fluidly across tasks, maintain continuity, and deliver compounding value with every interaction.Case in point:This isn’t just theory. One standout example comes from McKinsey. Their internal GenAI tool,Lilli,is context engineering in action. The tool unifies over 40 knowledge repositories and 100,000+ documents into a single searchable graph. When a consultant poses a question, it retrieves the five to seven most relevant artifacts, generates an executive summary, and even points to in-house experts for follow-up. This retrieval-plus-synthesis loop has driven ~72% firm-wide adoption and saves teams ~30% of the time they once spent hunting through SharePoint, wikis, and email threads, proof that the decisive edge isn’t just a bigger model, but a meticulously engineered stream of proprietary context (Source).What Does ContextActuallyMean in the Enterprise?By now, it’s clear that providing the right context is key to unlocking the full potential of AI and agentic systems inside organizations. But “context” isn’t just a document or a code snippet; it’s a multi-layered, fragmented, and evolving ecosystem. In real-world settings, it spans everything from database schemas to team ownership metadata, each layer representing a different slice of what an intelligent system needs to reason, act, and adapt effectively.Based on my experience working across hundreds of data sources and collaborating with cross-functional product, engineering, and data teams, I’ve found that most enterprise context and information fall into nine broad categories. These aren’t just a checklist; they form a mental model: each category captures a dimension of the environment that AI agents must understand, depending on the use case, to operate safely, accurately, and effectively within your organization.Read the full article on Packt’s Medium. If you’re new, make sure to follow our Medium handle and subscribe to our newsletter for more insights like this!📈 Patterns & Practice: What’s Moving the World of Data & ML⭕ Implementing a Tool-Enabled Multi-Agent Workflow with Python, OpenAI API, and PrimisAI Nexus: Learn how to implement a multi-agent AI system using Python, OpenAI API, and PrimisAI Nexus. The tutorial covers setting up hierarchical supervision, defining structured JSON schemas, and integrating tools for code validation, statistical analysis, and documentation search. Agents collaborate to automate complex workflows across planning, development, QA, and data analysis with scalable, role-based coordination.⭕ Hugging Face Releases SmolLM3: A 3B Long-Context, Multilingual Reasoning Model: Hugging Face's SmolLM3 is a compact 3B-parameter multilingual model offering SoTA reasoning, tool use, and 128k-token context handling. Released in base and instruct variants, it rivals 7B+ models across benchmarks like XQuAD and MGSM. SmolLM3 is ideal for multilingual RAG, agent workflows, and edge deployments, delivering powerful performance with efficiency and accessibility.⭕ Microsoft Open-Sources GitHub Copilot Chat Extension for VS Code—Now Free for All Developers: Microsoft has open-sourced the GitHub Copilot Chat extension for VS Code under the MIT license, unlocking premium AI coding tools for free. With Agent Mode, Edit Mode, predictive Code Suggestions, and in-editor Chat, developers gain powerful automation, multi-file editing, and contextual assistance, paving the way for customizable, AI-enhanced workflows across open-source and enterprise environments.⭕ Google AI Just Open-Sourced a MCP Toolbox to Let AI Agents Query Databases Safely and Efficiently: Google’s new MCP Toolbox for Databases simplifies secure, schema-aware SQL integration for AI agents with just a few lines of Python. Part of the open-source GenAI Toolbox, it supports PostgreSQL/MySQL, MCP-compliant interfaces, connection pooling, and safe query generation, enabling reliable database access for LLM workflows in analytics, customer support, DevOps, and enterprise automation.⭕ The Five-Second Fingerprint: Inside Shazam’s Instant SongID: Part of the Behind the Tap series, this deep dive unpacks how Shazam identifies songs in seconds using audio fingerprinting, FFT-based spectrograms, and hash matching. It explains the journey from a tap to real-time song recognition, reveals Shazam’s scalable architecture, and explores its industry impact, from music discovery to market insights used by Apple and record labels.⭕ POSET Representations in Python Can Have a Huge Impact on Business: POSETs (Partially Ordered Sets) offer a powerful alternative to traditional ranking systems by preserving multidimensional relationships without forcing a linear order. This post shows how POSETs can improve decision-making by avoiding arbitrary weighting and oversimplification, using Python and the Wine Quality dataset to build dominance matrices, Hasse diagrams, and interpret incomparability across samples.⭕ Build Algorithm-Agnostic ML Pipelines in aBreeze: MLarena is a newly open-sourced, algorithm-agnostic machine learning toolkit built on MLflow for training, evaluating, tuning, and deploying models. It balances automation with expert control, offering built-in diagnostics, explainability tools, robust hyperparameter optimization via Bayesian search, and seamless MLflow integration. MLarena simplifies end-to-end ML workflows while enhancing model transparency, stability, and reproducibility.See you next time!*{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0}#converted-body .list_block ol,#converted-body .list_block ul,.body [class~=x_list_block] ol,.body [class~=x_list_block] ul,u+.body .list_block ol,u+.body .list_block ul{padding-left:20px} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}}
Read more
  • 0
  • 0
Modal Close icon
Modal Close icon