On June 23, users discovered that ChatGPT quietly launched Bidi 1, a new two-way voice model. Available in the model selector alongside standard and advanced voice modes, its breakthrough feature is real-time 'listen while speaking' — users can interrupt mid-conversation with new instructions. This marks a shift from turn-taking to natural simultaneous conversation. No official announcement yet; wider testing expected this week. A strong response to Google Gemini Live and Anthropic Claude Voice.
Model Watch
Weekly LLM rankings, capability comparison and benchmarks. Track GPT, Claude, Gemini and domestic model updates.
Alibaba's Qwen team released Qwen-AgentWorld, a native language world model covering MCP, Search, Terminal, SWE, Web, OS, and Android. The core innovation is 'predict-then-act' — agents simulate actions before executing, reducing trial-and-error costs. Trained on 10M+ real interaction traces through CPT→SFT→RL pipeline. Achieved 58.71 on AgentWorldBench, surpassing GPT-5.4 (58.25) and Claude Opus 4.8. Fully open-source.
Sky Computing Lab released FastWan-QAD, a high-speed video generation series using Quantization-Aware Distillation (QAD) trained on FastVideo. The key selling point: extreme speed — generating a 5-second 480P video in just 1.8 seconds on a single RTX 5090, tens of times faster than traditional methods. QAD lets large models 'teach' smaller models to be leaner while preserving quality. Weights, code, and blog posts are open-sourced for free use.
Popular AI image tool Krea AI released the full technical report for Krea 2, detailing data strategy, architecture design, and training techniques. Krea is known for real-time image generation and editing capabilities with a loyal global creator community. The document covers data cleaning pipelines, multimodal alignment methods, and inference optimization — valuable engineering insights for AI image developers and researchers.
French AI company Mistral AI released Mistral OCR 4, next-gen document recognition with bounding box detection, block classification (titles, tables, equations, signatures), and per-word confidence scores. Supports 170 languages across 10 language families including Chinese, Japanese, Arabic, Hebrew. Self-hostable via single Docker container for sensitive documents. Scored 85.20 on OlmOCRBench benchmark with 72% annotator preference rate. Priced at $4 per 1,000 pages (50% off for batch API). Ideal for enterprises processing contracts, invoices, forms at scale.