AI Weekly - 2026-05-20

Published

aitechfoundation-models

Three things worth tracking this week - Anthropic shipping a usable 1M-context Opus 4.7, leaked OpenAI o4 benchmarks, and Mistral becoming the first European lab to clear GPAI Tier-2 compliance.

One-line summary

Opus 4.7 makes 1M context usable—not the “fits in the window” kind of 1M, the 96%-recall kind [1][2]. That opens a “stuff it all in” product path alongside RAG, with the biggest implications for document-heavy enterprise scenarios.

Models

Anthropic shipped Claude Opus 4.7, headlining a 1M context window with retained recall [1]. Cohere’s lab notes independently reproduced RULER 1M at 96.3% needle-in-haystack accuracy [2]—the first public model to cross the 95% mark, which has historically been the divide between “marketed context” and “actually usable context”.

The Information published screenshots of internal OpenAI o4 benchmarks; AIME-25 (math olympiad) is up 17 pp over o3 [3]. If verified at release, o4 reopens the frontier-reasoning gap. Caveat: internal benchmark ≠ shipping model, and the leak omits chain-of-thought budget.

DeepSeek-V4 briefly topped the LMSYS Chatbot Arena reasoning sub-leaderboard under the anonymous handle bouvardia-r2 [6]. The Chinese community has all but confirmed it’s the new DeepSeek release.

Regulation and compliance

Mistral passed the EU AI Office’s GPAI Tier-2 review, becoming the first European frontier lab to clear that compliance bar [4]. Tier-2 demands complete model cards, third-party red-team reports, and training-data provenance disclosure [5]. Later this year Tier-2 becomes mandatory, which will shape US labs’ EU launch cadence.

Compute

SemiAnalysis reports that NVIDIA Blackwell-2 volume shipments have slipped from Q2 to Q3, blamed on CoWoS-L yield [7]. The hyperscaler trio is rebalancing 2026 training-cluster racking; AWS and Azure are leaning into more aggressive H200 renewals.

Applications

Anthropic opened Computer Use beta to enterprise customers [8]—the first time a major lab has shipped “agentic screen control” as an SLA-backed enterprise product. Limited to Bedrock or direct-API accounts spending ≥ USD 5k/month.

Capex vs revenue

Sequoia updated their AI capex-revenue gap analysis: on an annualised basis, capex now exceeds attributable AI revenue by roughly USD 600B [9]. Their take is more optimistic than last year’s, though: “the gap is real but recoverable in 36–48 months if inference unit-cost continues compounding at -45%/yr”.

Ecosystem

Hugging Face published a mid-2026 commercial-use tally for open-weight licences [10]: clearly commercial-permissive includes Llama, Mistral, Qwen, DeepSeek; ambiguous (gated by an Acceptable Use Policy) includes Gemma and Falcon.

Sources

  1. 93 [1] Anthropic releases Claude Opus 4.7 with 1M context window Anthropic Blog Read original ↗
  2. 89 [2] Long-context retrieval accuracy crosses 96% on RULER 1M benchmark Lab Notes - Cohere Read original ↗
  3. 87 [3] Leaked: OpenAI o4 internal benchmark shows step-change on AIME-25 The Information Read original ↗
  4. 85 [4] Mistral becomes first European frontier lab to clear GPAI Tier-2 compliance Reuters Read original ↗
  5. 83 [5] What GPAI Tier-2 actually requires - documentation, eval, red-team EU AI Office Briefing Read original ↗
  6. 81 [6] DeepSeek-V4 quietly tops Chatbot Arena reasoning leaderboard LMSYS Read original ↗
  7. 80 [7] NVIDIA Blackwell-2 supply pulled back to Q3, hyperscalers re-plan SemiAnalysis Read original ↗
  8. 78 [8] Anthropic's Computer Use is now available in beta to enterprise tier TechCrunch Read original ↗
  9. 76 [9] Sequoia: AI capex still pacing ahead of revenue, gap widens to USD 600B Sequoia Capital Perspectives Read original ↗
  10. 74 [10] Open-source weight licensing: a tally of who allows commercial use mid-2026 Hugging Face Blog Read original ↗