New AI Model Outperforms ChatGPT in AGI Benchmark Tests

New AI Model

Scientists at Singapore-based artificial intelligence firm Sapient have announced a new model that shows stronger reasoning abilities than ChatGPT and other leading AI systems. Called the Hierarchical Reasoning Model (HRM), the system is inspired by how the human brain processes information at different speeds and levels.

The HRM uses only 27 million parameters and 1,000 training samples—far fewer than the billions of parameters used in most large language models (LLMs). Despite its smaller scale, it has delivered impressive results in advanced artificial general intelligence (AGI) benchmarks.

Strong Results in ARC-AGI Benchmarks

In the widely recognized ARC-AGI benchmark tests, HRM scored 40.3% in ARC-AGI-1, outperforming OpenAI’s 03-mini-high (34.5%), Anthropic Claude 3.7 (21.2%), and DeepSeek R1 (15.8%). In the tougher ARC-AGI-2 test, HRM achieved 5%, a score that also exceeded its competitors.

Unlike traditional LLMs that rely on “chain-of-thought” reasoning, HRM performs reasoning in a single forward pass. It combines two modules: a high-level planner for abstract thinking and a low-level processor for quick calculations, mirroring how different parts of the brain handle planning and fast reactions.

Smarter Problem-Solving Approach

HRM also uses “iterative refinement,” starting with a rough solution and improving it through short bursts of reasoning. This approach allowed the model to solve Sudoku puzzles and find paths through mazes—tasks where most LLMs usually struggle.

Caution and Verification

The findings were published on the arXiv preprint server and have not yet been peer-reviewed. Independent researchers from the ARC-AGI benchmark team confirmed the model’s results but suggested that its strong performance may be linked to a specific refinement process during training rather than the hierarchical design itself.

Even with these questions, HRM highlights a potential new direction for AI research, showing that smaller and more efficient models can compete with much larger systems when it comes to reasoning and problem-solving.