Mixture of Horizons in Action Chunking

¹ Renmin University of China
² University of North Carolina at Chapel Hill
³ The Chinese University of Hong Kong

† Corresponding author

Paper Code

Models

Code and models are all released!🔥

TL;DR

VLA models' performance is sensitive to the action chunk length (horizon). The single horizon induces an inherent trade-off between long-term foresight and short-term precision.
We propose Mixture of Horizons (MoH), a plug-and-play strategy that fuses multiple horizons within a single policy to inherit the strengths of both with minimal training or inference overhead.
MoH enables Dynamic Inference, selecting stable actions through cross-horizon consensus for higher efficiency and robustness.

Horizon Trade-off — **Trade-off Effect of action horizon on \(\pi_0\).** Longer horizons facilitate structural foresight (beneficial for Goal/Long tasks), whereas shorter horizons ensure precise control (crucial for Spatial/Object tasks). Our MoH strategy alleviates this trade-off and raises overall performance.

MoH Concept — **Mixture of Horizons.** Action queries in multiple horizons are processed in parallel via a shared action transformer and integrated by a lightweight mixture layer. MoH simultaneously enables long-term foresight and short-term precision for VLAs.

Abstract

Vision-language-action (VLA) models have shown remarkable capabilities in robotic manipulation, but their performance is sensitive to the action chunk length used during training, termed "horizon". Our empirical study reveals an inherent trade-off: longer horizons provide stronger global foresight but degrade fine-grained accuracy, while shorter ones sharpen local control yet struggle on long-term tasks, implying fixed choice of single horizons being suboptimal. To mitigate the trade-off, we propose a mixture of horizons (MoH) strategy. MoH rearranges the action chunk into several segments with different horizons, processes them in parallel with a shared action transformer, and fuses outputs with a lightweight linear gate.
It has three appealing benefits.
1) MoH exploits long-term foresight and short-term precision jointly within a single model, improving both performance and generalizability to complex tasks.
2) MoH is plug-and-play for full-attention action modules with minimal training or inference overhead.
3) MoH enables dynamic inference with adaptive horizons, which selects stable actions through cross-horizon consensus, achieving 2.5x higher throughput than baselines while preserving superior performance.
Extensive experiments over flow-based policies π₀, π_0.5, and one-step regression policy π_reg demonstrate that MoH yields consistent and significant gains on both simulations and real-world tasks. Notably, under mixed-task setting, π_0.5 with MoH reaches a new state-of-the-art with 99% average success rate on LIBERO after only 30k training iterations.

Framework - Mixture of Horizons

Following the Occam’s razor principle, we adopt the simplest way to implement the mixture of horizons strategy.
To begin with, the action-related input is rearranged into different horizons and processed in parallel by a shared action transformer. Then, we introduce a linear gate head similar to the action projection head, with only \(2k\) parameters, to produce per-step, per-horizon weights to fuse horizon-wise predictions. To prevent the gating head collapse to some preferred horizons, we also introduce a balance loss to encourage all horizons are effectively utilized.
Notably, our mixture of horizons strategy is compatible with both Flow-Matching policies and One-Step policies with minimal training or inference overhead.

Architecture — Overview of the Mixture of Horizons framework.

Simulation Experiments

Results on LIBERO. Mixture of Horizons yields consistent and significant gains across all baselines (\(\pi_0\), \(\pi_{0.5}\), \(\pi_{reg}\)).
\(\pi_{0.5}\) with MoH achieves SOTA 99% success rate on LIBERO with only 30k training iterations and batch size of 32.
Interestingly, \(\pi_{reg}\), obtained by fine-tuning from the \(\pi_{0}\) base model, can even outperform the standard fine-tuned flow-matching-based \(\pi_{0}\), and achieves the best performance across regression or classification-based VLA models. Given that LIBERO’s training and evaluation settings are highly in-distribution, this result indicates that the policy with regression objective converges well on small-scale downstream tasks.

LIBERO Results — Comparison of VLA models on LIBERO. Iters is the abbreviation of training iterations. Best results are in **bold**. MoH consistently improves flow-matching and regression-based baselines. † UniVLA and X-VLA use large training batch size of 192 and 128, separately.

Results on RoboTwin2.0. We also evluate MoH on 7 representative tasks from RoboTwin2.0. Results show that MoH not only boosts in-distribution convergence, but also enhances robustness and generalization to more challenging task configurations.

RoboTwin Comparison — Performance on RoboTwin 2.0 (Easy & Hard settings). \(\pi_0\) with MoH consistently outperforms the base \(\pi_0\) model.

Qualitative Results on LIBERO

SPATIAL: pick up the black bowl on the wooden cabinet and place it on the plate

OBJECT: pick up the chocolate pudding and place it in the basket

GOAL: open the top drawer and put the bowl inside

LONG: put both moka pots on the stove

LONG: put both alphabet soup and cream cheese box in the basket

LONG: put both alphabet soup and tomato sauce in basket

LONG: put white mug on the plate and put chocolate pudding to the right of the plate

Qualitative Results on RoboTwin2.0

Task 1: place shoe

Task 2: move can pot

Task 3: click alarmclock

Task 4: click bell

Task 5: move playingcard away

Task 6: open microwave

Task 7: stack blocks two

Dynamic Inference via Horizon Consensus

MoH enables a dynamic inference scheme for stable and fast inference. Specifically, each horizon are treated as a voter and prefix actions that receive consistent support across horizons are identified, forming a self-truncating executable chunk while deferring uncertain actions to the next replanning iteration. Notably, even when the throughput is increased to 2.5× the default setting (5 steps), \(\pi_{0.5}\) with MoH under dynamic inference still outperforms the baseline \(\pi_{0.5}\).

Algorithm of dynamic inference via cross-horizon consensus.

Dynamic Inference Overview — Our strategy integrates action chunks of multiple horizons via a shared action transformer and a lightweight mixture gating mechanism.

We visualize one rollout on LIBERO-Long under dynamic inference. For this trajectory, we display most timesteps together with the action-chunk lengths that are actually executed. A clear pattern emerges: around decision points, such as when the robot changes its movement direction or commits to approaching a new target object, and during fine-grained manipulation (e.g., grasping and lifting the bottle), the policy tends to select only the shortest horizon of 5 steps. In contrast, when the system is in a relatively stable and low-risk phase, such as translating the grasped object or moving the arm through free space toward a pre-grasp configuration, the executed chunks become noticeably longer.

Dynamic Inference Stats — Example of dynamic inference on LIBERO-Long. \(\pi_{0.5}\) with MoH runs dynamic inference with scaling ratio r = 1.1. After each action chunk prediction, only the prefix actions with horizon consensus are executed. Shorter chunks are selected near decision points and fine-grained manipulation, whereas longer chunks are used during smooth, low-risk motions.

Default Inference: prefix 5 actions executed

Dynamic Inference: select executable actions via horizon consensus

Latency Comparison

We present the training and inference time cost of \(\pi_{0}\) and \(\pi_{0.5}\) under different horizon settings. Benefiting from data parallelism, MoH brings very little additional time overhead for both training and inference. Importantly, the inference latency is virtually unaffected, which means that MoH does not impact the control frequency and fully preserves the usability of VLA models.

Training and Inference Efficiency — Visualization of the overhead under different horizon settings.

Effect of Balance Loss

To prevent the collapse of gating head, we introduce a balance loss, please refer to Section 3.2 in paper.
We present the horizon weights of \(\pi_{0.5}\) with MoH on LIBERO-Long task suite. Without the balance loss, the gate head tends to assign higher weights to action chunks with longer horizons, because longer horizons participate in more steps during action mixture. This introduces statistical and gradient bias during training and manifests as an imbalance in gating learning. After introducing the balance loss, this bias is effectively suppressed, enabling the gating head to better leverage predictions from each horizon. Meanwhile, because the balance loss acts only as a regularization term, it does not forcibly flatten the weights, thereby avoiding excessive averaging.

For more ablation studies, please refer to Section 4.3 in paper!

Real-World Experiments

We also conduct real-world experiments on three tasks. These tasks jointly require instruction following, object relocation and rotation, and precise grasping and placement, providing a comprehensive evaluation of VLA models in real-world settings. As shown in Figure 10, across all three tasks and for both base models, the MoH strategy yields consistent performance gains.

Real World Setup and Results — Experimental settings and results in real-world scenarios.

Qualitative Comparisons

Baseline (\(\pi_0\))

Ours (\(\pi_0\) + MoH)

Baseline (\(\pi_{0.5}\))

Ours (\(\pi_{0.5}\) + MoH)

Citation

@article{jing2025mixture_of_horizons,
  title={Mixture of Horizons in Action Chunking},
  author={Jing, Dong and Wang, Gang and Liu, Jiaqi and Tang, Weiliang and Sun, Zelong and Yao, Yunchao and Wei, Zhenyu and Liu, Yunhui and Lu, Zhiwu and Ding, Mingyu},
  journal={arXiv preprint arXiv:2511.19433},
  year={2025}
}