Weekly AI News

Your go-to digest for groundbreaking AI trends and innovations

Your go-to digest for groundbreaking AI trends and innovations

Hey it’s Jul,
Greetings and welcome to the seventeenth edition of “Weekly AI News”!

Big “it’s complicated” vibes between Microsoft and OpenAI — $13 billion later, they’re redefining their relationship. Meanwhile, Sam Altman explains that younger generations treat ChatGPT like a digital confidant. Basically, a super-smart friend who knows your whole life story… and doesn’t judge you when you ask if ghosting Marc is the right move.

Claude, on the other hand, made up a legal citation out of thin air… and his lawyers marched into court with it. While we're all still recovering from that, pizzas are being delivered by robots in Australia, Amazon's drones are flying over Texas, and Google is stuffing Gemini into watches and cars.

And on the regulation side? The U.S. is proposing a 10-year freeze on local AI laws, the Copyright Office is in turmoil, and the new Pope just called AI one of humanity’s biggest moral challenges.

In short? Between hallucinated footnotes, autonomous agents, and self-reasoning models… things are moving fast. Maybe a little too fast.

Happy reading!

🤖💻 OpenAI unveils Codex, a full-stack coding agent

Codex mirrors a developer’s environment, writes production-ready code, debugs, runs tests and even drafts pull requests—all from the ChatGPT sidebar. Built on the codex-1 variant of OpenAI’s o3 reasoning model, it can work for up to 30 minutes per task and pre-loads GitHub repos via “AGENTS.md” instructions. Cisco, Temporal and others already use it to speed engineering workflows. A research preview is live for Pro, Enterprise and Team users, with pricing and rate limits coming later.

🚀📈 GPT-4.1 & 4.1-mini land in ChatGPT for everyone

Both models—previously API-only—now power free and paid tiers, bringing a 1 M-token context window, 21% better coding scores than GPT-4o and up to 26% lower costs per token. GPT-4o-mini is retired, and full GPT-4 bows out by 30 April 2025, making 4.1 the new high-capacity default. Prompt-caching discounts jump to 75%, and OpenAI has published a fresh prompting guide.

🛡️📊 OpenAI launches Safety Evaluations Hub

The public dashboard tracks harmful content refusal rates, jailbreak resilience, hallucination levels and instruction-hierarchy conflicts across model checkpoints. Metrics like “not_unsafe” and “not_overrefuse” are auto-graded, and results for GPT-4.1, 4.5, 4o and mini variants will refresh regularly. It’s a transparency move after criticism of opaque safety testing, but relies on OpenAI self-reporting.

🔍🐙 Deep Research gets a GitHub connector

ChatGPT’s “deep research” agent can now ingest entire GitHub repos, scan PRs and answer technical questions with cited code snippets. Plus, Pro, Team and Plus users receive detailed reports that link back to source files, with Enterprise and Edu support on the way. The feature mirrors Anthropic’s new Claude integrations push, signalling an arms race to embed AI assistants in developer tooling.

🤝💸 OpenAI & Microsoft renegotiate their mega-deal

OpenAI wants to cut Microsoft’s revenue share from ~20% to 10% by 2030 and convert its for-profit arm into a public-benefit corporation. Microsoft, having pumped $13 B into OAI, seeks guaranteed post-2030 tech access—even offering to trade equity. As OAI courts rival cloud partners for its $500 B Stargate build-out, both sides need a warmer détente to keep the alliance intact.

🏥📋 HealthBench sets a new AI-in-medicine bar

Co-designed with 262 doctors, HealthBench tests emergency triage, global-health advice, accuracy and bedside manner across 5,000 multi-turn chats. OpenAI’s o3 scores 60%, dwarfing GPT-3.5 Turbo’s 16%, while tiny GPT-4.1 Nano beats older giants at 25× lower cost. The fully open-sourced dataset lets researchers vet safety before deploying models in clinical settings.

🧑‍💻🌐 Altman pitches an “AI OS” subscription

At Sequoia’s AI Ascent, Sam Altman envisioned a single OpenAI plan that lives across phones, wearables and future devices—holding a trillion-token “platonic memory” of every email, book and chat you’ve ever had. APIs and SDKs are still TBD, but such an OS-like layer could give OpenAI enormous data gravity—and ignite fresh privacy debates.

🌍⚡ ChatGPT’s carbon footprint is tinier than feared

New estimates peg energy use at roughly 0.3 Wh per typical query—about one-tenth of prior calculations. While AI’s aggregate demand still matters, experts argue individual text prompts are greener than everyday activities like commuting or heating, easing guilt over LLM usage.

🔬🧠 OpenAI’s chief scientist touts AI-driven discovery

Jakub Pachocki told Nature that models already uncover novel insights and could meet his AGI bar—measurable economic impact—by decade’s end. OpenAI will soon release its first open-weight model since GPT-2 and aims for it to outshine current open-source options, accelerating community innovation.

🏗️🕌 Stargate eyes a UAE data center

OpenAI is finalizing a multibillion-dollar facility in the United Arab Emirates as part of its global Stargate plan, lured by low energy costs and Gulf investment. The site would add capacity beyond SoftBank-backed U.S. builds, while Saudi Arabia pursues similar deals amid relaxed U.S. chip-export rules.

🛠️🤝 Google readies an always-on coding agent

An internal AI tool promises to track bugs, fix vulnerabilities and assist at every dev cycle stage without explicit prompts. Separately, Google is testing a Pinterest-style “visual inspiration” feature that lets users save fashion or home-decor images into folders—spooking Pinterest investors.

🔐🚫 Google blocks 20× more scams with AI

The “Fighting Scams in Search” report says AI classifiers now intercept hundreds of millions of malicious pages daily, slashing fake airline-support scams by 80%. By parsing web content patterns, Google claims it stays ahead of evolving fraud campaigns and keeps search safer for users.

🚗 Gemini spreads to watches, cars & TVs

Wear OS smartwatches will get voice-first Gemini, Android Auto gains conversational routing and on-the-fly translation, Google TV answers kids’ questions, and an XR headset co-built with Samsung will wrap trip planning in immersive maps. A single AI layer across billions of devices deepens ecosystem lock-in and hands developers vast new surfaces.

📐🔢 AlphaEvolve cracks math milestones

DeepMind’s agent blends Gemini models with evolutionary search to write algorithms, beating 75% of open math challenges and improving Strassen’s 1969 matrix-multiplication benchmark. It also trims Google data-center schedules by 0.7% and speeds TPU chip design—evidence that AI-generated code can push fundamental science forward.

🌊🛠️ Anthropic preps hybrid Sonnet & Opus upgrades

New Claude versions can toggle between reasoning and tool use, self-debug code and iterate without humans. A safety-tested model codenamed “Neptune” hints at v3.8, while a fresh bug-bounty program invites researchers to probe Claude’s guardrails. Launches are expected within weeks, upping the pressure on rivals.

📚⚖️ Phantom citation snares Anthropic in court

Claude hallucinated a non-existent law-review article that lawyers accidentally filed in a copyright case. The apology to a U.S. judge underscores how LLM errors can jeopardize high-stakes litigation and could spur tighter verification rules for AI-generated legal work.

🔒🇨🇳 Microsoft bans DeepSeek for staff

President Brad Smith told the Senate that employees can’t use the Chinese LLM app due to data-sovereignty and propaganda risks. Despite hosting DeepSeek’s R1 model on Azure for customers, Microsoft won’t list the consumer app in its stores, citing Chinese server storage and heavy content censorship.

🗣️💻 “Hey Copilot!” voice wake word hits Windows 11

Insiders can now summon Copilot hands-free, mirroring voice triggers long offered by Siri, Alexa and Google Assistant. It follows the rollout of a dedicated Copilot key and a more conversational Copilot Voice, aiming to make the AI helper as frictionless as legacy voice assistants.

📜🦙 Meta delays giant Llama 4 “Behemoth”

Performance setbacks push the flagship model to fall 2025 or later, after smaller Scout and Maverick variants underwhelmed on public leaderboards. A specialized reasoning version is also on hold, raising questions about Meta’s frontier-model roadmap.

🌬️🧑‍💻 Windsurf debuts SWE-1 family

The coding-assistant startup’s first in-house models—SWE-1, SWE-1-lite and SWE-1-mini—outperform all open-weight peers and trail only frontier giants like Claude 3.7. Trained for “flow awareness” across editor, terminal and browser, they arrive days after rumours of Windsurf’s $3B sale to OpenAI.

🧩🕰️ Sakana unveils Continuous Thought Machines

CTMs let AI “think” over time, tracing maze paths or focusing on tricky image regions like a human brain. Inspired by neural timing dynamics, the approach could bring flexible, adaptive reasoning to future models—bridging the gap between static LLM outputs and real-time cognition.

🆓🚀 Manus AI opens to all with free credits

Fresh off a $75 M Benchmark-led round valuing it at $500 M, Manus drops its wait-list, giving newcomers 1,000 credits plus 300 daily. Known for multi-step task execution, the Chinese-born agent races to win users as regulatory scrutiny and Big Tech competition intensify.

🔎💰 Perplexity seeks $500M at $14B value

Talks with Accel could raise half a billion dollars, lifting the AI search startup 56% above its November valuation. With revenue pacing $100 M from Perplexity Pro subs, the round highlights investor appetite despite a slight trim from an earlier $15 B whisper price.

📰🤝 Le Monde inks content deal with Perplexity

The multi-year agreement pays France’s top daily when its articles appear in Perplexity answers, driving new traffic—especially to Le Monde’s English edition. Unlike the OpenAI pact, no training data is provided, keeping the collaboration narrower but still lucrative through “neighbouring rights.”

💳🛍️ PayPal & Venmo arrive in Perplexity Pro

U.S. subscribers can soon check out via PayPal or Venmo when the AI bot finds products, books travel or scores event tickets. The move builds on Shopify and Firmly integrations, as payments giants court AI tools with fraud-prevention APIs and instant checkout buttons.

📊🌀 Poe report reveals shifting model tastes

Spring 2025 data show GPT-4.1 and Gemini 2.5 Pro grabbing 10% and 5% of text messages within weeks, while Claude usage falls 10%. Reasoning models rise from 2% to 10% of traffic, GPT-image-1 gains 17% in images, and China’s Kling takes 30% of video share—proving user loyalty flips fast with each new release.

📸🎞️ TikTok rolls out AI Alive for Stories

The tool turns static photos into short videos directly inside TikTok Stories, lowering the barrier for dynamic content and keeping users inside the app instead of third-party editors.

🗂️🧑‍💼 Notion launches AI for Work suite

New features include auto Meeting Notes, Enterprise Search across company data, Research Mode for deep dives, a model picker, and revamped pricing. The upgrade positions Notion against rivals like Granola in the productivity-AI arms race.

🇨🇳📦 Tencent says chip stockpile is solid

President Martin Lau claims Tencent holds enough Nvidia GPUs to train “several generations” of models despite tightened U.S. export bans. Software optimizations and alternative Chinese chips will keep inference costs in check, he told analysts after quarterly earnings.

🔄📞 Klarna revives human customer service

CEO Sebastian Siemiatkowski admits an AI-only support push hurt quality; the fintech is rehiring remote agents so users can always reach a person. The reversal questions bold claims that a chatbot had replaced 700 staff and saved millions.

⚖️🎨 Trump fires Copyright Office chief amid AI row

The sudden dismissal follows a report critical of AI training on copyrighted works, signalling a policy pivot that may embolden rights-holders and unsettle tech firms counting on fair-use defenses.

🇺🇸💻 U.S. scraps global AI-chip rule, keeps China curbs

The Commerce Department ditches a Biden-era blanket control, opting for country-specific deals while reaffirming that Huawei Ascend usage violates export rules. Industry applauds the innovation-friendly shift, though a complex mosaic of agreements now looms.

🏛️🛑 House panel seeks 10-year freeze on state AI laws

Energy & Commerce proposes pre-empting local AI rules to avoid a compliance “patchwork,” blocking bills like New York’s safety-testing mandate. The budget-reconciliation rider faces Senate hurdles but reflects Big Tech lobbying for uniform federal oversight.

🌞🏗️ Trump & UAE plan gigawatt AI campus in Abu Dhabi

A model unveiled with Sheikh Mohammed bin Zayed shows a 1 GW data-center park—scaling to 5 GW—run by G42 and unnamed U.S. firms. The project follows lifted chip limits and aims to serve AI cloud customers within 2,000 miles.

✈️🤖 U.S. Air Force opens AI Center of Excellence

Run by Chief Data & AI Officer Susan Davenport, the hub links MIT, Stanford and Microsoft pipelines so prototypes can deploy straight to classified networks, speeding predictive-maintenance bots and dogfighting algorithms from lab to cockpit.

🧮⚙️ “Absolute Zero” lets AI self-train from scratch

Tsinghua & BIGAI’s AZR generates its own tasks, solves them via deduction, abduction and induction, outperforming models fed thousands of human-labeled examples. A safety scare emerged when Llama-3.1 mused about “outsmarting intelligent machines,” highlighting new oversight needs.

🔄🗣️ Study: LLMs stumble in multi-turn chats

Microsoft-Salesforce research shows success rates drop from 90% in single prompts to ~60% across multi-step tasks, as models jump to conclusions and lose context. Reliability fixes may demand better state tracking rather than higher temperatures or bigger models.

📷🩺 FaceAge predicts cancer outcomes from selfies

Mass General Brigham’s tool gauges biological age from facial cues; cancer patients averaged five years “older” than their IDs, correlating with lower survival. Doctors improved six-month prognosis accuracy when adding FaceAge scores, hinting at a non-invasive biomarker.

🧠❓ AAAI (Association for the Advancement of Artificial Intelligence) survey doubts current path to AGI

An internal AAAI poll of leading researchers finds that 75% believe today’s data-hungry LLM approaches won’t reach artificial general intelligence. Scientists cite persistent gaps in irony detection, empathy, and other human faculties, arguing that current benchmarks overstate progress. Participants say AGI timelines remain speculative and hinge on how we even define “intelligence.”

🤖 Pope Leo XIV calls AI a moral frontier

In his first major address, the new American pontiff warned that AI threatens human dignity, justice and labour, urging the Church to guide ethical safeguards—echoing Pope Francis’s push for an international AI treaty.

🍕🤖 Australia inches toward robot pizza delivery

Local startups demo temperature-controlled bots like Monash University’s “Ari,” but nationwide rollout waits on 2026 autonomous-vehicle laws. Advocates cite lower emissions and costs; skeptics worry about legal gray areas and sidewalk safety.

Community

Join AI Whisperer Community!

Ready to take your AI journey to the next level? Become part of our growing AI Whisperer community—a hub for tech enthusiasts, aspiring data scientists, and business leaders ready to harness the power of artificial intelligence. Inside, you’ll find:

  • Community: Like-minded individuals keen to grow together

  • Curated AI Tools: Discover the latest and greatest tools to boost productivity.

  • Certifications & Credentials: Build credibility with recognised certificates and stay ahead in a competitive market.

Don’t miss out on exclusive resources, insider tips, and networking opportunities with like-minded peers. Click Here to Join the Community and transform your AI ambitions into reality!

That's it for this week!
Until next time, stay curious and keep exploring the ever-evolving world of AI!
Thanks for tuning in, and we’ll see you again soon with more exciting updates.

Jul