Fast Thinking for Large Language Models

Zheng, Haoyu; Wang, Zhuonan; Yuan, Yuqian; Lin, Tianwei; Zhang, Wenqiao; Lv, Zheqi; Li, Juncheng; Tang, Siliang; Zhuang, Yueting; He, Hongyang

Computer Science > Computation and Language

arXiv:2509.23633 (cs)

[Submitted on 28 Sep 2025]

Title:Fast Thinking for Large Language Models

Authors:Haoyu Zheng, Zhuonan Wang, Yuqian Yuan, Tianwei Lin, Wenqiao Zhang, Zheqi Lv, Juncheng Li, Siliang Tang, Yueting Zhuang, Hongyang He

View PDF HTML (experimental)

Abstract:Reasoning-oriented Large Language Models (LLMs) often rely on generating explicit tokens step by step, and their effectiveness typically hinges on large-scale supervised fine-tuning or reinforcement learning. While Chain-of-Thought (CoT) techniques substantially enhance performance on complex reasoning tasks, they remain inefficient, requiring long reasoning traces that increase latency and token usage. In this work, we introduce Latent Codebooks for Fast Thinking, a framework that uses concise CoT sketches only during training to learn a codebook of discrete strategy priors. At inference, the model conditions on a handful of continuous thinking vectors distilled from the codebook in a single pass, enabling strategy-level guidance without producing explicit reasoning tokens. To complement this design, we propose GainRouter, a lightweight routing mechanism that adaptively switches between fast codebook guided inference and slow explicit reasoning, thereby suppressing overthinking and reducing unnecessary token generation. Experiments across multiple reasoning benchmarks show that our approach achieves competitive or superior accuracy while substantially lowering inference cost, offering a practical path toward efficient and controllable reasoning in large language models.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2509.23633 [cs.CL]
	(or arXiv:2509.23633v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2509.23633

Submission history

From: Haoyu Zheng [view email]
[v1] Sun, 28 Sep 2025 04:19:48 UTC (952 KB)

Computer Science > Computation and Language

Title:Fast Thinking for Large Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Fast Thinking for Large Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators