Vision-language-action (VLA) models have shown remarkable capabilities in robotic manipulation, but their performance is sensitive to the
action chunk length used during training, termed "horizon".
Our empirical study reveals an inherent trade-off: longer horizons provide stronger global foresight but degrade fine-grained accuracy, while shorter ones sharpen local control yet struggle on long-term tasks, implying fixed choice of single horizons being suboptimal.
To mitigate the trade-off, we propose a mixture of horizons (MoH) strategy.
MoH rearranges the action chunk into several segments with different horizons, processes them in parallel with a shared action transformer,
and fuses outputs with a lightweight linear gate.
It has three appealing benefits.
1) MoH exploits long-term foresight and short-term precision jointly within a single model, improving both performance and generalizability to complex tasks.
2) MoH is plug-and-play for full-attention action modules with minimal training or inference overhead.
3) MoH enables dynamic inference with adaptive horizons, which selects stable actions through cross-horizon consensus, achieving 2.5x higher throughput than baselines while preserving superior performance.
Extensive experiments over flow-based policies π0, π0.5, and one-step regression policy πreg demonstrate that MoH yields consistent and significant gains on both simulations and real-world tasks.
Notably, under mixed-task setting, π0.5 with MoH reaches a new state-of-the-art with 99% average success rate on LIBERO after only 30k training iterations.
Following the Occam’s razor principle, we adopt the simplest way to implement the mixture of horizons strategy.
To begin with, the action-related input is rearranged into different horizons and processed in parallel by a shared action transformer.
Then, we introduce a linear gate head similar to the action projection head, with only \(2k\) parameters, to produce per-step, per-horizon weights to fuse horizon-wise predictions.
To prevent the gating head collapse to some preferred horizons, we also introduce a balance loss to encourage all horizons are effectively utilized.
Notably, our mixture of horizons strategy is compatible with both Flow-Matching policies and One-Step policies with minimal training or inference overhead.
Results on LIBERO.
Mixture of Horizons yields consistent and significant gains across all baselines (\(\pi_0\), \(\pi_{0.5}\), \(\pi_{reg}\)).
\(\pi_{0.5}\) with MoH achieves SOTA 99% success rate on LIBERO with only 30k training iterations and batch size of 32.
Interestingly, \(\pi_{reg}\), obtained by fine-tuning from the \(\pi_{0}\) base model, can even outperform the standard fine-tuned flow-matching-based \(\pi_{0}\), and achieves the best performance across regression or classification-based VLA models.
Given that LIBERO’s training and evaluation settings are highly in-distribution, this result indicates that the policy with regression objective converges well on small-scale downstream tasks.
Results on RoboTwin2.0. We also evluate MoH on 7 representative tasks from RoboTwin2.0. Results show that MoH not only boosts in-distribution convergence, but also enhances robustness and generalization to more challenging task configurations.
MoH enables a dynamic inference scheme for stable and fast inference. Specifically, each horizon are treated as a voter and prefix actions that receive consistent support across horizons are identified, forming a self-truncating executable chunk while deferring uncertain actions to the next replanning iteration. Notably, even when the throughput is increased to 2.5× the default setting (5 steps), \(\pi_{0.5}\) with MoH under dynamic inference still outperforms the baseline \(\pi_{0.5}\).
We visualize one rollout on LIBERO-Long under dynamic inference. For this trajectory, we display most timesteps together with the action-chunk lengths that are actually executed. A clear pattern emerges: around decision points, such as when the robot changes its movement direction or commits to approaching a new target object, and during fine-grained manipulation (e.g., grasping and lifting the bottle), the policy tends to select only the shortest horizon of 5 steps. In contrast, when the system is in a relatively stable and low-risk phase, such as translating the grasped object or moving the arm through free space toward a pre-grasp configuration, the executed chunks become noticeably longer.
We present the training and inference time cost of \(\pi_{0}\) and \(\pi_{0.5}\) under different horizon settings. Benefiting from data parallelism, MoH brings very little additional time overhead for both training and inference. Importantly, the inference latency is virtually unaffected, which means that MoH does not impact the control frequency and fully preserves the usability of VLA models.
To prevent the collapse of gating head, we introduce a balance loss, please refer to Section 3.2 in paper.
We present the horizon weights of \(\pi_{0.5}\) with MoH on LIBERO-Long task suite.
Without the balance loss, the gate head tends to assign higher weights to action chunks with longer horizons, because longer horizons participate in more steps during action mixture.
This introduces statistical and gradient bias during training and manifests as an imbalance in gating learning.
After introducing the balance loss, this bias is effectively suppressed, enabling the gating head to better leverage predictions from each horizon.
Meanwhile, because the balance loss acts only as a regularization term, it does not forcibly flatten the weights, thereby avoiding excessive averaging.
For more ablation studies, please refer to Section 4.3 in paper!
We also conduct real-world experiments on three tasks. These tasks jointly require instruction following, object relocation and rotation, and precise grasping and placement, providing a comprehensive evaluation of VLA models in real-world settings. As shown in Figure 10, across all three tasks and for both base models, the MoH strategy yields consistent performance gains.
@article{jing2025mixture_of_horizons,
title={Mixture of Horizons in Action Chunking},
author={Jing, Dong and Wang, Gang and Liu, Jiaqi and Tang, Weiliang and Sun, Zelong and Yao, Yunchao and Wei, Zhenyu and Liu, Yunhui and Lu, Zhiwu and Ding, Mingyu},
journal={arXiv preprint arXiv:2511.19433},
year={2025}
}