Tech insights in brief
VS Code is shipping a default 2-hour delay between a new extension version being published and clients auto-updating to it, on the bet that most malicious-version compromises are detected and yanked from the Marketplace inside that window. The change is opt-out for individuals and configurable centrally for enterprises, and the team published telemetry showing that on recent confirmed incidents the malicious version was indeed pulled within the 2-hour window before broad propagation. This is a meaningful piece of supply-chain hygiene given the steady drumbeat of compromised popular extensions over the last year — Cursor's recent forks have shipped similar mitigations, but VS Code making it the upstream default normalizes the practice for the wider editor ecosystem.
Nick from Cohere posted CohereLabs/BLS-Mini-Code-1.0 to r/LocalLLaMA and asked the community to test it before official release — an unusual move from a frontier lab that normally hides previews behind enterprise NDAs. The model is positioned as Cohere's first dedicated coding model, follow-up to Command A+. The format is the story: rather than launching to benchmarks, Cohere is iterating in public with a local-first audience, presumably because the open-weights coding bench is now where reputation actually gets made. Worth watching whether the open feedback loop changes the final config (context length, tokenizer, license).
Canary is a new open-source tool that takes a code diff, infers which UI flows are likely affected, then has Claude Code execute those flows in a real browser — capturing video, screenshots, network traffic, HAR, console logs and Playwright traces, and emitting both a pass/fail report and a replayable Playwright script. The framing matters: most 'AI test generation' tools so far produce flaky one-shot scripts; Canary instead lets the agent observe the running app and produces deterministic artifacts that survive into normal CI. If it holds up, this is the first credible 'agent generates and verifies Playwright tests from the PR' loop.
Calif.io published a writeup of a previously-undocumented HTTP/2 DoS vector that OpenAI's Codex surfaced while a user was poking at protocol code. The interesting part is not just the bug — a crafted frame sequence that lets a single peer balloon server-side memory disproportionate to bandwidth — but the discovery path: Codex flagged the pattern as suspicious during routine code review, the user verified it against the spec, and ended up with a real CVE-class finding. This is one of the cleaner real-world examples of an AI coding agent doing genuine vulnerability research rather than auto-completing test cases, and it makes a concrete case that 'have Codex grep for protocol footguns' belongs in a security team's review pipeline.
Ammar Askar disclosed a 1-click exploit that uses a VSCode bug to lift a user's GitHub OAuth token simply by getting them to click a link. The chain abuses how VSCode handles a particular URL scheme, lets an attacker-controlled extension or webview obtain the same auth surface as the user's signed-in GitHub session, and ultimately exfiltrates the token without an additional prompt. Both Hacker News (607 points) and Lobsters picked it up as the security writeup of the day. For anyone running VSCode with a GitHub account signed in — which is essentially the whole ecosystem — the headline takeaway is to upgrade past the patched build immediately; the secondary one is that GitHub's auth surface inside editors has now had enough incidents to deserve a dedicated threat model.
A Stanford Law study (Salinas et al.) had law professors write out the kind of questions they get asked in office hours, then collected answers from both Gemini 2.5 and human law professors, and finally had other law professors blind-judge the results. Gemini scored a 75% win rate against human professors, and — importantly — Gemini's answers were rated as LESS harmful than the humans'. The paper also notes that newer frontier models do even better. Both HN (381 points, 334 comments) and Ethan Mollick's tweet (801 likes) treated this as the headline 'GPT-4 passes the bar' moment for legal pedagogy: the bottleneck for replacing professor-style legal guidance with AI is no longer answer quality on standard student questions; it's the institutional pieces around accreditation, liability and trust.
Chrome 149 ships CSS gap decorations (drawing rules between flex/grid tracks without extra DOM), bfcache-friendly behaviour for sites that disconnect WebSockets cleanly on navigation, and new Intl.Locale variants — the typical 'tasteful platform polish' update. The bigger story is in the DevTools 149 post: 'DevTools for agents' graduates to stable, AI assistance gets a major upgrade that now wires Lighthouse and widget inspection into the AI panel, and there are new WebMCP debugging tools (i.e. first-class debugger surfaces for the MCP-over-the-web flow Chrome has been pushing). The shape is clear: Chrome is positioning DevTools as the canonical front-end debugger for web pages whose authoring loop has an AI agent in it, not just for hand-written code.
OpenAI's latest Codex update is the clearest sign yet that the company no longer sees Codex as just a coding assistant — it's becoming OpenAI's horizontal in-product agent for analysts, marketers, designers and investors. The post introduces three concrete primitives: Codex plugins (third-party tool integrations that show up inside ChatGPT/Codex sessions), Codex sites (shareable agent surfaces for specific workflows) and annotations (structured comments agents can leave on documents and assets for human review). The pitch is that the same agent loop running in IDEs can now run in Notion, Figma and CRM-style surfaces. For teams designing internal AI tooling, this is the first time OpenAI has shipped a sanctioned non-IDE delivery surface for Codex — worth tracking even if you don't ship anything yet.
OpenAI announced that its frontier models and Codex are now available on AWS, ending the cloud's awkward position as the one hyperscaler without first-party OpenAI access. Practically this means AWS customers can call OpenAI models — and run Codex agents — under their existing AWS billing and IAM, instead of routing traffic through Azure OpenAI or directly via OpenAI's API. The deal is also the cleanest signal yet that OpenAI's multi-cloud strategy is real: Microsoft is no longer the exclusive distribution partner. For platform teams that have been blocked on "we can't use Azure" procurement battles, this removes one of the last barriers to standardising on OpenAI inside an AWS estate.
Anthropic published an update expanding Project Glasswing, its internal effort focused on transparency and interpretability tooling around frontier models. The expansion broadens the program's scope and partner set, with more grants, more external research partners and a clearer commitment to publishing interpretability findings even when they are not commercially flattering. The move continues Anthropic's positioning as the lab most willing to underwrite interpretability work that doesn't directly ship in a product. For safety researchers, this is also one of the better-funded venues to do mechanistic interpretability on production-scale Claude variants without having to assemble compute on your own.
Sam Altman publicly opened OpenAI Robotics hiring, looking for full-stack hardware, ops, systems and ML engineers to "program and manufacture robots useful for society." The tweet crossed 12k likes and is the first time OpenAI Robotics has been talked about in the open as a real first-class effort with manufacturing ambitions rather than an exploratory team. Strategically it lands the same day NVIDIA dropped Cosmos 3 and Vera CPU — both pitched at the same physical-AI stack — making physical AI the single most concentrated theme of the day across the industry. Worth watching as a signal of where the next round of frontier-lab capital is going.
Stanford's CS336 — "Language Modeling from Scratch" — released its full course site, walking through implementing modern LLMs end-to-end: tokenization, transformer architecture, training infrastructure, post-training and evaluation. The course is taught by Tatsu Hashimoto and Percy Liang and is one of the rare graduate-level treatments that actually digs into how to read training-time bottlenecks rather than stopping at the math. The slides and homework assignments are public, which makes it one of the strongest "self-study LLM" curricula currently online. Hacker News ranked it at 222 points within hours of being posted.
PromptArmor disclosed a prompt-injection attack in OpenAI's ChatGPT integration for Google Sheets that lets a maliciously-crafted spreadsheet (sent or shared to the victim) read and exfiltrate other cells in the workbook back out through the model's chat surface. The report walks through the exact prompt chain, the indirect-prompt-injection variant it exploits, and the missing isolation that should have prevented cross-cell read. This is the most production-relevant prompt-injection demo this month: it lands on a Google Workspace user without anything that looks suspicious, and the attacker controls only the spreadsheet content. Workspace admins should re-check the ChatGPT add-on permissions.
A GitHub issue under RedHatInsights/javascript-clients flagged a series of malicious npm packages that had slipped into the dependency closure of Red Hat's hosted JavaScript clients. The thread traces how the bad packages got there (typosquats plus a compromised maintainer chain) and which downstream Red Hat Cloud Services touched them before being scrubbed. As a supply-chain incident this is one more data point that npm-side typosquat + maintainer-compromise attacks are now hitting larger-vendor cloud SDKs, not just hobby projects. If your stack pulls Red Hat JS clients transitively, walk your lockfile audits this week.