AI Agent Evaluations

Performance results of AI coding agents on Next.js code generation and migration tasks, measuring success rate and execution time.

View on GitHub

Last run date: April 6, 2026

Agent Performance Results

	Agent
Cursor Composer 2.5	Cursor	149.18s	92%	96%
Claude Opus 4.8	Claude Code	166.30s	88%	96%
GPT 5.5 Pro	Codex	777.79s	83%	83%
GPT 5.4 (xhigh)	Codex	219.37s	83%	92%
GPT 5.3 Codex (xhigh)	Codex	178.20s	83%	96%
MiniMax M3	OpenCode	181.30s	75%	96%
GLM 5.1	OpenCode	254.36s	75%	100%
Claude Opus 4.7 (max)	Claude Code	142.63s	75%	100%
Gemini 3.1 Pro Preview	Gemini CLI	244.70s	75%	96%
Claude Opus 4.6	Claude Code	186.96s	75%	100%
Cursor Composer 2.0	Cursor	113.53s	75%	96%
Gemini 3.0 Pro Preview	Gemini CLI	256.87s	67%	88%
Cursor Composer 1.5	Cursor	120.63s	67%	88%
Claude Sonnet 4.6	Claude Code	156.89s	58%	100%
GPT 5.2 Codex (xhigh)	Codex	148.75s	58%	83%
MiniMax M2.7	OpenCode	294.01s	50%	63%
Claude Sonnet 4.5	Claude Code	149.24s	50%	88%
Kimi K2.5	OpenCode	135.42s	21%	58%

* AGENTS.md provides bundled Next.js documentation for AI coding agents. The column shows additional evals that passed when agents had access to this documentation.