Qwen3.5-397B-A17B
Unified vision-language MoE model with 397B total / 17B active params. Uses Gated Delta Networks + sparse MoE for efficient inference. Supports 201 languages and achieves cross-generational parity with Qwen3.
View model ↗Top AI News Weekly
Qwen3.5 (397B MoE) and Gemini 3.1 Pro push frontier reasoning. ByteDance launches Seed2.0 model family. TTS explodes with Ming-omni-tts, Kani-TTS-2, LuxTTS, and KittenTTS. MonarchRT enables real-time 16 FPS video gen on a single GPU. ZUNA pioneers brain-data foundation models.
36 launches and research drops that matter for enterprise AI builders—curated, tagged, and ready for your next roadmap sync.
New drops
36
Unique sources
32
Key themes
Immersive · Developer · Frontier
New reasoning systems, world models, and alignment papers that move the state of the art.
Unified vision-language MoE model with 397B total / 17B active params. Uses Gated Delta Networks + sparse MoE for efficient inference. Supports 201 languages and achieves cross-generational parity with Qwen3.
View model ↗Major reasoning upgrade scoring 77.1% on ARC-AGI-2 (more than double 3 Pro). Rolling out across Gemini API, AI Studio, Vertex AI, Gemini CLI, Google Antigravity, and NotebookLM.
View release ↗ByteDance's second-gen foundation model family with Pro, Lite, and Mini tiers. Pro focuses on long-chain reasoning for complex workflows; Lite balances quality and speed; Mini optimizes throughput.
View release ↗Compact 3B-parameter language model designed for efficient deployment with strong performance on reasoning and general tasks.
View model ↗High-reasoning GGUF distillation of Qwen3-14B using Claude 4.5 Opus as the teacher model. Offers Opus-level reasoning in a locally runnable 14B package.
View model ↗Open-weights 3.35B multilingual model optimized for balanced representation across 70+ languages including many lower-resourced ones. Supports downstream adaptation and local deployment.
View model ↗First brain-data foundation model. 380M-param diffusion autoencoder trained on scalp-EEG signals for denoising, channel reconstruction, and novel signal prediction. A step toward thought-to-text.
View release ↗Qwen3.5 now available through Alibaba Cloud Model Studio with API access, documentation, and enterprise deployment options.
View release ↗Video, audio, and physics-native generation techniques shaping spatial computing.
Unified visual creation model supporting T2V, T2I, instruction-based video-to-video, and image editing. Built on diffusion + transformer architecture with multi-GPU distributed inference.
View model ↗General-purpose image editing model with native editing from T2I foundations. Supports text style preservation, old photo restoration, multi-image editing, and virtual try-on. Includes 1,673-pair bilingual benchmark.
View model ↗Unified audio generation framework for speech, music, and sound effects with cross-modal control and composition capabilities.
View project ↗Novel vector-to-pixel image generation approach enabling precise control over image synthesis from vector representations.
View project ↗Advanced video generation technique using anchor-based temporal weaving for improved consistency and motion coherence in synthesized videos.
View project ↗Learning-based unified video editing framework that provides consistent and high-quality edits across temporal sequences.
View project ↗Framework that transforms code descriptions into interactive 3D worlds using AI-driven scene generation and spatial reasoning.
View project ↗Google introduces AI-powered music generation capabilities within the Gemini ecosystem for creative audio production.
View release ↗Unified 0.5B audio model generating speech, music, and sound in one channel. Custom 12.5Hz tokenizer + Patch-by-Patch compression yields 3.1Hz inference. 93% Cantonese accuracy and SOTA emotion control.
View model ↗English text-to-speech model with natural prosody and high-quality voice synthesis for production use cases.
View model ↗Open-source text-to-speech model delivering premium voice quality with support for multiple speaking styles and emotional tones.
View model ↗Lightweight TTS engine designed for fast inference and easy integration into applications. Open-source with simple API.
View repo ↗Embodied agents learning to act in complex virtual and hybrid worlds.
Personal AI assistant running on any OS/platform. Supports WhatsApp, Telegram, Slack, Discord, Signal, iMessage, and Teams with agent-to-agent sessions and a skills registry (ClawHub).
View repo ↗Fully autonomous AI agent system for penetration testing. Multi-agent architecture with Langfuse integration, knowledge graphs (Graphiti), and containerized execution environments.
View repo ↗Open-source DevOps agent in Rust that ships code on autopilot. Lives on machines 24/7 with adaptive intelligence, security hardening, and MCP/ACP protocol support.
View repo ↗Context management framework for AI agents providing unified memory, retrieval, and state persistence across multi-turn interactions.
View repo ↗AI-powered shopping platform with intelligent product discovery, comparison, and purchasing workflows for e-commerce automation.
View release ↗Minimalist AI agent framework focused on zero-configuration setup and rapid prototyping for conversational AI applications.
View repo ↗Ultra-lightweight agent runtime designed for resource-constrained environments with minimal memory footprint and fast startup.
View repo ↗Frameworks, playbooks, and OSS repos that level-up AI engineering velocity.
Sparse attention method for video generation DiTs using Monarch matrices. Achieves 95% attention sparsity with no quality loss and 1.4–11.8x speedup over FlashAttention. Enables real-time 16 FPS video on a single RTX 5090.
View project ↗Unified inference and post-training framework for accelerated video generation. Features sparse distillation (>50x denoising speedup), Video Sparse Attention, and FSDP2 scalable training across H100/A100/4090.
View repo ↗Build modular and scalable LLM applications in Rust. Supports multiple providers with composable agents, tools, and RAG pipelines. Used by 500+ downstream projects.
View repo ↗Specialized 3B OCR VLM trained on business documents and scientific articles. Supports full-page reading, grounded text detection with bounding boxes, and localized reading within user-specified regions.
View model ↗Compact OCR model optimized for document digitization with high accuracy on complex layouts and multi-language text recognition.
View model ↗Comprehensive blog post comparing cutting-edge open OCR models, evaluation methodology, and tools for running models locally and remotely.
View model ↗AWS CDK infrastructure stacks for deploying AI workloads with production-ready patterns for compute, storage, and networking.
View repo ↗High-performance vector encoding library from Alibaba optimized for large-scale similarity search and embedding operations.
View repo ↗In-depth essay exploring the trajectory toward ubiquitous AI adoption, covering infrastructure scaling, edge deployment, and enterprise integration challenges.
View release ↗