Why This Matters

Planning in high-dimensional continuous action spaces suffers from the curse of dimensionality and the curse of history, making real-time decision-making challenging even with advanced planning methods. This work is innovative because it demonstrates how learned temporal abstractions can dramatically reduce computational complexity while maintaining decision quality, enabling fast planning in complex stochastic environments. The approach bridges the gap between neural representation learning and classical planning methods.

What We Did

This paper proposes Latent Macro Action Planner, which addresses sequential decision-making in high-dimensional continuous action spaces through learned temporal abstractions. The approach uses a state-conditioned vector quantized variational autoencoder to discretize complex action sequences into manageable macro-actions, enabling efficient planning in pre-constructed latent spaces. The framework combines Monte Carlo Tree Search for planning with learned prior policies, allowing effective exploration and exploitation under both deterministic and stochastic dynamics.

Key Results

Evaluation across diverse continuous control tasks including robotic manipulation and autonomous driving demonstrates that the approach achieves better performance with lower decision latency compared to both model-based baselines and direct RL methods. The framework scales effectively to high-dimensional problems where traditional planning becomes infeasible.

Full Abstract

Cite This Paper

@inproceedings{luo2025scalable,
  author = {Luo, Baiting and Pettet, Ava and Laszka, Aron and Dubey, Abhishek and Mukhopadhyay, Ayan},
  booktitle = {Proceedings of the 13th International Conference on Learning Representations, Singapore},
  title = {Scalable Decision-Making In Stochastic Environments Through Learned Temporal Abstraction},
  year = {2025},
  organization = {International Conference on Learning Representations},
  abstract = {Sequential decision-making in high-dimensional continuous action spaces, particularly in stochastic environments, faces significant computational challenges. We explore this challenge in the traditional offline RL setting, where an agent must learn how to make decisions based on data collected through a stochastic behavior policy. We present \textit{Latent Macro Action Planner} (L-MAP), which addresses this challenge by learning a set of temporally extended macro-actions through a state-conditional Vector Quantized Variational Autoencoder (VQ-VAE), effectively reducing action dimensionality. L-MAP employs a (separate) learned prior model that acts as a latent transition model and allows efficient sampling of plausible actions. During planning, our approach accounts for stochasticity in both the environment and the behavior policy by using Monte Carlo tree search (MCTS). In offline RL settings, including stochastic continuous control tasks, L-MAP efficiently searches over discrete latent actions to yield high expected returns. Empirical results demonstrate that L-MAP maintains low decision latency despite increased action dimensionality. Notably, across tasks ranging from continuous control with inherently stochastic dynamics to high-dimensional robotic hand manipulation, L-MAP significantly outperforms existing model-based methods and performs on par with strong model-free actor-critic baselines, highlighting the effectiveness of the proposed approach in planning in complex and stochastic environments with high-dimensional action spaces.},
  acceptance = {32.8},
  category = {selective},
  contribution = {colab},
  keywords = {temporal abstraction, planning under uncertainty, continuous action spaces, latent representations, Monte Carlo tree search, reinforcement learning, stochastic dynamics}
}
Quick Info
Year 2025
Keywords
temporal abstraction planning under uncertainty continuous action spaces latent representations Monte Carlo tree search reinforcement learning stochastic dynamics
Research Areas
scalable AI planning POMDP
Search Tags

Scalable, Decision, Making, Stochastic, Environments, Learned, Temporal, Abstraction, temporal abstraction, planning under uncertainty, continuous action spaces, latent representations, Monte Carlo tree search, reinforcement learning, stochastic dynamics, scalable AI, planning, POMDP, 2025, Luo, Pettet, Laszka, Dubey, Mukhopadhyay