AI Weekly - 2026-05-20
Three things worth tracking this week - Anthropic shipping a usable 1M-context Opus 4.7, leaked OpenAI o4 benchmarks, and Mistral becoming the first European lab to clear GPAI Tier-2 compliance.
One-line summary
Opus 4.7 makes 1M context usable—not the “fits in the window” kind of 1M, the 96%-recall kind [1][2]. That opens a “stuff it all in” product path alongside RAG, with the biggest implications for document-heavy enterprise scenarios.
Models
Anthropic shipped Claude Opus 4.7, headlining a 1M context window with retained recall [1]. Cohere’s lab notes independently reproduced RULER 1M at 96.3% needle-in-haystack accuracy [2]—the first public model to cross the 95% mark, which has historically been the divide between “marketed context” and “actually usable context”.
The Information published screenshots of internal OpenAI o4 benchmarks; AIME-25 (math olympiad) is up 17 pp over o3 [3]. If verified at release, o4 reopens the frontier-reasoning gap. Caveat: internal benchmark ≠ shipping model, and the leak omits chain-of-thought budget.
DeepSeek-V4 briefly topped the LMSYS Chatbot Arena reasoning sub-leaderboard under the anonymous handle bouvardia-r2 [6]. The Chinese community has all but confirmed it’s the new DeepSeek release.
Regulation and compliance
Mistral passed the EU AI Office’s GPAI Tier-2 review, becoming the first European frontier lab to clear that compliance bar [4]. Tier-2 demands complete model cards, third-party red-team reports, and training-data provenance disclosure [5]. Later this year Tier-2 becomes mandatory, which will shape US labs’ EU launch cadence.
Compute
SemiAnalysis reports that NVIDIA Blackwell-2 volume shipments have slipped from Q2 to Q3, blamed on CoWoS-L yield [7]. The hyperscaler trio is rebalancing 2026 training-cluster racking; AWS and Azure are leaning into more aggressive H200 renewals.
Applications
Anthropic opened Computer Use beta to enterprise customers [8]—the first time a major lab has shipped “agentic screen control” as an SLA-backed enterprise product. Limited to Bedrock or direct-API accounts spending ≥ USD 5k/month.
Capex vs revenue
Sequoia updated their AI capex-revenue gap analysis: on an annualised basis, capex now exceeds attributable AI revenue by roughly USD 600B [9]. Their take is more optimistic than last year’s, though: “the gap is real but recoverable in 36–48 months if inference unit-cost continues compounding at -45%/yr”.
Ecosystem
Hugging Face published a mid-2026 commercial-use tally for open-weight licences [10]: clearly commercial-permissive includes Llama, Mistral, Qwen, DeepSeek; ambiguous (gated by an Acceptable Use Policy) includes Gemma and Falcon.
Sources
- 93 [1] Anthropic releases Claude Opus 4.7 with 1M context window Read original ↗
- 89 [2] Long-context retrieval accuracy crosses 96% on RULER 1M benchmark Read original ↗
- 87 [3] Leaked: OpenAI o4 internal benchmark shows step-change on AIME-25 Read original ↗
- 85 [4] Mistral becomes first European frontier lab to clear GPAI Tier-2 compliance Read original ↗
- 83 [5] What GPAI Tier-2 actually requires - documentation, eval, red-team Read original ↗
- 81 [6] DeepSeek-V4 quietly tops Chatbot Arena reasoning leaderboard Read original ↗
- 80 [7] NVIDIA Blackwell-2 supply pulled back to Q3, hyperscalers re-plan Read original ↗
- 78 [8] Anthropic's Computer Use is now available in beta to enterprise tier Read original ↗
- 76 [9] Sequoia: AI capex still pacing ahead of revenue, gap widens to USD 600B Read original ↗
- 74 [10] Open-source weight licensing: a tally of who allows commercial use mid-2026 Read original ↗