Top AI News Weekly

Week 20 · May 11 – May 17, 2026

Neural Computers week — AI agents moving into ambient personal compute. Claude Code ships goal-oriented loops + Karpathy's CLAUDE.md discipline. OpenAI launches Codex from anywhere and ChatGPT personal finance. Google DeepMind reimagines the mouse pointer for AI interaction. Krea 2 ships first foundation model with stylistic control. Qwen 3.6 27B MTP, MiniCPM-V 4.6, and Cactus Needle (26M function-call model for tiny devices) advance frontier and edge inference. dflash speculative decoding. Agent ecosystem explosion: agency-agents (full AI agency), hyperresearch (agent-driven research wiki), dograh open-source voice platform, alumnium AI E2E testing, claude-context MCP, MinerU document processing, Google's official Skills repo. Immersive wave: AIDC Pixelle-Video automated short-video engine, Supertonic on-device multilingual TTS, ResembleAI Dramabox, Scenema voice cloning, SeeDance2Stitcher (30s seamless videos), Articraft 3D assets, TrackCraft3R dense 3D tracking, Pixal3D, NVIDIA SANA-WM minute-scale world model, PhyMotion physics-grounded human video, Khala music AI. Research: SIGMAP explainable receptor affinity, MAMMAL biomedical foundation model (npj Drug Discovery), Thinking Machines on Interaction Models, Warp-as-History. Tooling: Hermes desktop companion, openhuman personal AI, streambert media app, logto SaaS auth.

56 launches and research drops that matter for enterprise AI builders—curated, tagged, and ready for your next roadmap sync.

New drops

56

Unique sources

52

Key themes

Developer · Immersive · Frontier

Jump to week

frontier

Frontier Models & Research

New reasoning systems, world models, and alignment papers.

Frontier LLMPolymarket

Gemini 3.2 Release Prediction

Prediction market projects 88% chance Gemini 3.2 ships within one week — a signal of strong insider belief in an imminent Google frontier model release.

View release ↗

Research PaperarXiv

The Bystander Effect in Multi-Agent Reasoning

Paper quantifying "cognitive loafing" in collaborative multi-agent LLM interactions — agents perform worse on tasks when other agents are present, mirroring the human bystander effect. Implications for agent team design.

Image GenerationStanford University (Hansheng Chen et al.)

AsymFlow

Rank-asymmetric flow model for pixel-space image generation achieving 1.57 FID on ImageNet 256×256. Enables finetuning latent models (FLUX.2 klein) into pixel-space models while beating latent baselines on HPSv3, DPG-Bench, and GenEval benchmarks.

View release ↗

Frontier LLMRDson

Qwen3.6-27B-MTP-Q4_K_M-GGUF

Quantized GGUF format version of Qwen 3.6 27B model with Q4_K_M quantization, enabling efficient local deployment of the language model with reduced memory footprint.

Frontier LLMOpenBMB

MiniCPM-V-4.6

A multimodal vision-language model combining compact architecture with visual understanding capabilities. Designed for efficient inference while maintaining performance on vision-language tasks.

Biomedical Foundation ModelIBM Research

MAMMAL

Molecular Aligned Multi-Modal Architecture and Language — a foundation model for biomedical discovery published in npj Drug Discovery, unifying small molecules, proteins, and natural language for drug-target interaction prediction.

View release ↗

Research PaperThinking Machines Lab

Interaction Models

Mira Murati's lab proposes a scalable approach to human-AI collaboration where models are trained to interact, not just complete tasks — building durable working relationships across long horizons.

View release ↗

Research Paperyyfz

Warp-as-History

Unable to determine - page content could not be fetched from repository

immersive

Immersive Media & Simulation

Video, audio, and physics-native generation techniques.

Video GenerationGoogle (via TestingCatalog)

Gemini Omni (Leaked)

Evidence of an upcoming Gemini Omni video model spotted in the Gemini mobile app: "Meet our new video model. Remix your videos, edit directly in chat, try a template, and more." Expected at Google I/O 2026.

View release ↗

Video GenerationChetaslua

Gemini Omni vs Seedance 2.0 Comparison

Side-by-side video output comparison between the leaked Gemini Omni model and ByteDance Seedance 2.0, evaluating motion coherence, prompt adherence, and visual quality.

View release ↗

Image GenerationKrea AI

Krea 2 Launch Announcement

Krea ships its first foundation model, built from scratch for aesthetic diversity and stylistic control. Positions against generic text-to-image with editorial-grade style fidelity.

View release ↗

Prompt EngineeringDheepan Ratnam

Anime Fight Storyboard Prompt

Reference prompt structure for generating a 15-second 16:9 anime fight previsualization with shot-by-shot timing (0-3s, 3-6s, etc.), camera angles, and action beats — useful template for video model prompting.

View release ↗

Prompt EngineeringKōda

Seedance 2.0 Valence-Arousal Prompting

Experiments using emotional valence-arousal coordinates as prompting axes for Seedance 2.0 (via Mitte AI) — likely transferable to other video/image models for affective scene control.

View release ↗

Video ToolTomLikesRobots

Seedance 2.0 30s Seamless Stitcher

Tool for chaining Seedance 2.0 video extensions without the overlapping-frame and brightness-shift artifacts at join points. Detects duplicate frames, trims overlap, matches tone for clean 30s+ outputs.

View release ↗

Creative PlatformByteDance Dreamina

Dreamina Octo

New creative system from Dreamina built around Seedance 2.0, positioned as a creative partner that understands intent (not just a prompt-responder). Coming soon.

View release ↗

Motion CaptureViggle AI

Viggle PINOC

Free mocap tool: upload a motion video, get skeleton animation as .fbx/.glb ready for Blender or Maya. Bonus: upload a character image, get 3D Gaussian splatting model with motion preview. Replaces thousand-dollar mocap pipelines.

View release ↗

Video GenerationResearch

MoCam

Camera motion-controllable video generation system for cinematic shot synthesis with explicit camera path control.

View project ↗

World ModelResearch

DreamX-World

Generative world model for interactive scene exploration and long-horizon environment simulation.

View project ↗

Video GenerationResearch

Relit-LiVE

Relightable live video editing system for production-grade lighting transfer and post-capture relighting of dynamic footage.

View project ↗

Video GenerationResearch

CausalCine

Real-time autoregressive generation for multi-shot video narratives — generates coherent shot sequences with causal scene transitions for film-like outputs.

View project ↗

3D Visionldyang694

Pixa3D

3D object generation and reconstruction system from images using neural rendering. Generates high-quality 3D models with texture and geometry from 2D inputs.

View project ↗

3D GenerationViggle Inc.

Viggle AI

Motion capture and 3D generation platform that extracts skeletal animation from video without mocap suits, generates 3D characters and scenes, and enables real-time character swapping for meme creation and game content. Includes PINOC for motion extraction and Fight Anyone 3D, a PvP fighting game with AI-generated 3D motions.

View release ↗

Video GenerationAIDC-AI

Pixelle-Video

Video generation or processing framework leveraging the Pixelle architecture. Likely focused on efficient video synthesis or manipulation with neural rendering capabilities.

TTS ModelSupertone Inc.

Supertonic

Unable to determine - GitHub page content could not be fetched. Based on company background, likely a voice synthesis or audio AI tool.

3D VisionCVLAB KAIST

TrackCraft3r

3D object tracking and reconstruction system that combines tracking with 3D shape estimation from video sequences. Enables real-time multi-object tracking with 3D geometry recovery.

TTS ModelResembleAI

Dramabox

Voice generation and synthesis model for dramatic character audio. Designed for creating expressive speech with theatrical inflection and emotional variation.

UnknownKREA

KREA 2

Unable to determine - page content could not be fetched

View release ↗

Audio GenerationScenema

Scenema Audio

Audio generation or processing tool from Scenema, likely for creating or manipulating audio content, though specific technical details are unavailable due to page fetch limitations.

View release ↗

Video GenerationNVIDIA Labs

Sana-WM

Efficient world model for minute-scale video generation and prediction, enabling real-time synthesis of long-horizon visual sequences with reduced computational requirements.

View project ↗

Video GenerationUCLA, University of Utah

PhysMotion

Generates physics-grounded video dynamics from a single image by combining 3D Gaussian splatting with differentiable Material Point Method simulation and diffusion-based refinement. Produces physically plausible motion constrained by continuum mechanics priors rather than pure data-driven generation.

View project ↗

3D GenerationArticraft3D

Articraft

An agentic system for automated generation of articulated 3D assets at scale, combining AI agents with 3D modeling to produce rigged and animated character models.

View project ↗

Audio GenerationKhala-Music-AI

Khala

Music generation and processing AI system designed for audio synthesis and music creation tasks. Limited details available from repository metadata.

tooling

Developer Tooling & Infra

Frameworks, playbooks, and OSS repos.

MCP ServerTwelveData (via gus on X)

Claude Code Stocks + Crypto MCP

Single-command MCP install gives Claude Code live access to 17,000+ stocks, crypto prices, and financial statements in seconds. Demonstrated by gus showing one MCP add command turning Claude Code into a real-time financial analyst.

View release ↗

Industry CommentaryAlex Cooper (LinkedIn)

Alex Cooper AI Industry Post

AI industry commentary post from Alex Cooper on LinkedIn covering recent model releases and platform shifts.

View release ↗

Industry CommentaryLinkedIn

AI Practitioner LinkedIn Post

Notable AI practitioner post shared on LinkedIn covering frontier model developments and practical agent deployment lessons.

View release ↗

Developer ToolAndrej Karpathy

Karpathy's 4 CLAUDE.md Rules

Karpathy shares 4 CLAUDE.md rules that reduced Claude Code mistakes from 41% to 11% across 30 codebases — focused on planning discipline, constraint stating, and explicit no-go zones.

View release ↗

UnknownByteDance

Seed Dance 2.0

Unable to extract - page content could not be fetched. Based on URL pattern, likely a ByteDance AI/ML project or framework update.

View release ↗

Agent FrameworkTwelveData

MCP

Model Context Protocol implementation for financial data integration, enabling AI agents to access real-time market data and trading information through standardized protocol interfaces.

Developer Toolz-lab

dflash

Unable to determine - repository page could not be fetched. Based on naming convention, likely a flash attention or efficient transformer inference optimization library.

Unknownalberdom88

SIGMAP

Unable to determine - page content could not be fetched from GitHub repository.

Unknownjordan-gibbs

HyperResearch

Unable to determine - GitHub page content could not be fetched. Appears to be a research-related project based on repository name.

Developer Toolfathah

Hermes Desktop

Desktop application for running Hermes LLM locally with a graphical interface. Enables local inference of the Hermes language model without cloud dependencies.

Unknowntinyhumansai

OpenHuman

Unable to determine - page content could not be fetched from GitHub repository.

UnknownCactus Compute

Needle

Unable to determine - page content could not be fetched from GitHub repository

Unknownalumnium-hq

Alumnium

Unable to determine - page content could not be fetched from GitHub repository

Developer ToolOpenDataLab

MinerU

Document understanding and conversion tool that extracts structured data from PDFs and other document formats using machine learning. Designed for high-fidelity conversion of complex documents including layouts, tables, and embedded content.

Developer Toollogto-io

Logto

Open-source identity and access management (IAM) platform providing authentication, authorization, and user management infrastructure with support for multiple protocols and integrations.

Developer Tooltruelockmc

StreamBERT

Unable to determine - GitHub page content could not be fetched. Likely a BERT-based streaming or real-time processing framework based on naming convention.

UnknownTomClive

SeeDance2Stitcher

Unable to determine - page content could not be fetched from GitHub repository

Unknowndograh-hq

Dograh

Unable to determine - page content could not be fetched from GitHub repository

Developer ToolAnthropic

Claude Code Common Workflows

Documentation for Claude Code integration patterns and common use cases. Provides guidance on implementing code generation and execution workflows with Claude API.

View release ↗

Developer ToolFilmustage

Filmustage

AI-powered screenwriting and film production tool that automates script breakdown, scheduling, and budgeting for filmmakers and production teams.

View release ↗

Educational CourseStanford University

CS224N: Natural Language Processing with Deep Learning

Graduate-level course covering neural network fundamentals for NLP and large language models, with lectures, PyTorch assignments, and final projects. Covers deep learning basics through cutting-edge LLM research with publicly available videos and materials.

View release ↗

Developer ToolOpenAI

Codex API Access

API endpoint enabling programmatic access to OpenAI's Codex code generation model from any application or environment. Supports code completion, synthesis, and transformation tasks via REST interface.

View release ↗

Developer ToolOpenAI

Personal Finance ChatGPT

ChatGPT application focused on personal finance tasks, enabling users to get financial guidance and analysis through conversational AI.

View release ↗

Research PaperGoogle DeepMind

Mouse Pointer Interaction Redesign

Research on reimagining human-AI interaction paradigms through pointer-based interfaces, exploring new modalities for AI system control and navigation.

View release ↗