Yann LeCun’s warning, Anthropic’s expansion, and a deeper look at agent evaluation AI_Distilled #141: What’s New in AI This Week Vector memory for AI agents in air-gapped, regulated, and offline environments VectorAI DB delivers sub-15ms retrieval for agent memory and RAG pipelines on your own infrastructure. On-premises, at the edge, or air-gapped. Native support for LangChain, LlamaIndex, and Hugging Face. Free Community Edition available. Get Started for Free LLM Expert Insights, Packt LATEST DEVELOPMENTS 📉 AI pioneer warns of industry bubble as costs outpace revenues - AI researcher Yann LeCun has criticized Elon Musk’s xAI as a struggling competitor in the race for frontier AI while warning that the broader industry risks a “big bubble explosion” if leading labs fail to reduce costs or raise prices. LeCun argued that today’s AI services remain heavily subsidized by investors and suggested that more advanced AI systems may ultimately require new architectures beyond large language models. 🧠 MIT gives robots a memory that works more like ours - MIT researchers have developed a new memory framework that allows robots to remember objects, locations, and past observations using natural language, enabling them to answer questions such as “Where did I leave my wallet?” By combining 3D mapping with AI-generated descriptions, the system could help future robots navigate complex environments and collaborate more naturally with humans. 🌏 Microsoft becomes the primary gateway for OpenAI models in China - While OpenAI and Anthropic have largely stayed out of the Chinese market, Microsoft has emerged as the main supplier of OpenAI’s models to major Chinese technology companies through Azure. The arrangement highlights Microsoft’s unique position in the global AI ecosystem, even as concerns grow around model distillation, geopolitical tensions, and the flow of advanced AI capabilities across national boundaries. 🇰🇷 Anthropic expands into South Korea with new office and AI partnerships - Anthropic has opened a Seoul office and announced partnerships with major Korean organizations, including NAVER, Samsung SDS, LG CNS, and Nexon, as demand for Claude continues to grow across the region. The company also signed an agreement with South Korea’s Ministry of Science and ICT to collaborate on AI safety, cybersecurity, and responsible AI adoption. ⚖️ Study highlights why AI still struggles to moderate online hate speech - New research shows that leading AI moderation systems often disagree on what constitutes hate speech, producing inconsistent results across demographic groups and content types. While AI can detect explicit abuse at scale, researchers say it still struggles with context, sarcasm, coded language, and reclaimed terms, underscoring the challenges of relying on automated systems for online content moderation. Claude is currently the most powerful tool of 2026. Yet almost no one knows how to actually use them. Our expert mentors have condensed 800+ hours of Claude research, articles, YouTube content and real-world practice into a focused 16-hour curriculum. Join the 2-Day Claude AI Mastery Workshop: a live, end-to-end deep dive into Claude plus 10+ AI tools, LLMs and workflows. You will learn how to: - master Claude's three modes : Chat, Cowork and Code. - Set up Skills, Connectors and Plug-ins to automate your desktop, Notion and files. - Vibe code apps and dashboards without writing code & 10+ AI tools and workflows that pair with Claude. 🧠 Saturday & Sunday 🕜 10 AM – 7 PM EST Register NOW! 📈EXPERT INSIGHTS Why a Good Answer Doesn’t Mean a Good Agent During a time when AI conversations are often louder than they are useful, Ammar Mohanna, PhD, brings a refreshing perspective. His career has moved fluidly between academia and industry, from teaching advanced AI courses at the American University of Beirut to advising teams on turning machine learning ideas into systems that can be trusted. He is also known for his candid take on the current AI landscape, especially the gap between meaningful engineering and what he often calls AI slop. In this conversation, Ammar challenges one of the most common assumptions in agent development: that a correct answer is evidence of a successful agent. He explains why reliability lies in the path an agent takes, not just in the result it produces, and why evaluation must evolve from output scoring to a discipline that measures behaviour and trustworthiness in production. Most teams think they’re evaluating agents, but they’re actually not. Where do you see the biggest illusion of evaluation today? The biggest illusion is that teams think they are evaluating an agent when they are only evaluating the final answer. That works well for a chatbot. But an agent is different. It plans, chooses tools, passes arguments, reads observations, retries, stops, and sometimes takes action. A final-answer score hides most of the actual failure surface. An agent can produce a good-looking answer after calling the wrong tool, wasting ten steps, misreading a tool result, or ignoring a failed call. From the outside, the answer may look acceptable. From a reliability perspective, the run is not acceptable. So the illusion is: “the answer looked right, therefore the agent worked.” However, what you need to know is whether the path was valid, efficient, grounded, and safe. Read the Full Interview on Substack Most Claude Code content focuses on prompts and quick wins. This workshop explores what comes next. Join Sam Keen, former engineer at AWS, Lululemon, and Nike, to learn how high-performing teams use structured context, reusable skills, workflow memory, and guardrails to get more consistent results from Claude Code. 🎟️ Exclusive for AI Distilled subscribers: Get 60% off with code AI60. Limited to the first 10 sign-ups. Register Now Built something cool? Tell us. Whether it's a scrappy prototype or a production-grade agent, we want to hear how you're putting generative AI to work. Drop us your story at nimishad@packtpub.com or reply to this email, and you could get featured in an upcoming issue of AI_Distilled. 📢 If your company is interested in reaching an audience of developers and, technical professionals, and decision makers, you may want toadvertise with us. If you have any comments or feedback, just reply back to this email. Thanks for reading and have a great day! That’s a wrap for this week’s edition of AI_Distilled 🧠⚙️ We would love to know what you thought—your feedback helps us keep leveling up. 👉 Drop your rating here Thanks for reading, The AI_Distilled Team (Curated by humans. Powered by curiosity.) *{box-sizing:border-box}body{margin:0;padding:0}a[x-apple-data-detectors]{color:inherit!important;text-decoration:inherit!important}#MessageViewBody a{color:inherit;text-decoration:none}p{line-height:inherit}.desktop_hide,.desktop_hide table{mso-hide:all;display:none;max-height:0;overflow:hidden}.image_block img+div{display:none}sub,sup{font-size:75%;line-height:0} @media (max-width: 100%;display:block}.mobile_hide{min-height:0;max-height:0;max-width: 100%;display:none;overflow:hidden;font-size:0}.desktop_hide,.desktop_hide table{display:table!important;max-height:none!important}.social_block .social-table{display:inline-block!important}}
Read more
LLM Expert Insights, Packt
19 Jun 2026