POMDPs & Non-Stationary Environments

Why This Matters

The planning algorithms described in our neuro-symbolic planning research assume they know the current state of the world. In practice, they rarely do.

A transit dispatcher sees bus GPS positions updated every 30 seconds — but GPS has noise, lag, and dead zones in tunnels. Between updates, a bus might have picked up a surge of riders, hit unexpected congestion, or broken down. The dispatcher’s picture of the system is always incomplete and slightly stale. An emergency response coordinator knows where fire stations are, but doesn’t know where the next incident will strike, how traffic will affect response times, or whether a unit will become unavailable mid-shift. An energy manager scheduling EV charging can’t directly observe building occupancy, consumption at unmetered loads, or how electricity prices will evolve through the day.

This is the problem of partial observability: the decision-maker cannot see the true state of the world and must instead maintain a belief — a probability distribution over possible states — that gets updated as new observations arrive. Partially observable Markov decision processes (POMDPs) are the formal framework for this setting. They are elegant in theory but notoriously hard to solve in practice, because the belief state itself is continuous and high-dimensional even when the underlying state space is finite.

Compounding this, the world is non-stationary: the dynamics that govern how states evolve and how observations are generated change over time. Driving patterns shift with seasons. Demand for transit surges during events and drops during holidays. Construction reroutes traffic for months. A policy or model calibrated last quarter may be misleading today. Most decision-making algorithms assume the world’s rules are fixed — an assumption that fails routinely in the systems we work with.

Our research tackles both challenges: making POMDP solving tractable at the scales required by real cities, and building algorithms that detect and adapt when the environment itself changes.

State Estimation Under Partial Observability

The quality of any planning algorithm depends on the quality of the state estimate it starts from. If the planner thinks a bus is empty when it’s actually full, or thinks traffic is clear when it’s gridlocked, even a perfect planning algorithm will make bad decisions. State estimation — maintaining an accurate belief about the world from noisy, incomplete observations — is therefore a critical upstream problem.

Traditional approaches to belief maintenance in POMDPs use particle filters: they represent the belief as a collection of sampled states, weighted by how well they match the observations. This works adequately for small problems, but degrades in high-dimensional domains. When the state space is large (hundreds of vehicles, thousands of road segments, multiple sensor streams), particles become sparse, weights degenerate, and the belief estimate drifts from reality.

Our ESCORT framework (NeurIPS 2025) addresses this through Stein-variational inference — a technique that pushes particles toward high-probability regions of the belief space using gradient information, rather than relying on random resampling. ESCORT extends this with correlation-aware projections that preserve the statistical dependencies between state variables (e.g., the relationship between congestion on adjacent road segments) and temporal consistency constraints that prevent the belief from jumping erratically between timesteps. The result is belief tracking that scales to the high-dimensional, multi-modal distributions that arise in real CPS applications.

A complementary challenge arises when observations themselves are noisy or arrive at different rates. Traffic sensors update every 30 seconds; weather data comes hourly; demand forecasts update every 15 minutes. Our AIROAS method (ICAPS 2025) handles this by constructing a sequence of bridge distributions between what the model predicts and what the observations suggest, gradually annealing from prior to posterior rather than making a single large update that can cause particle degeneracy. This is particularly important in online POMDP planning, where belief quality at deeper search nodes directly affects the planner’s ability to reason about future consequences.

For applications where planning must happen under tight time constraints — such as UAV search and rescue, where every second of flight time matters — our Shrinking POMCP algorithm reduces the computational cost of belief-space tree search by detecting when distinct world states lead to equivalent optimal actions and collapsing them in the search tree.

Adapting to Non-Stationary Environments

Even with perfect state estimation, a planner will fail if its model of the world is wrong. Non-stationarity — changes in the environment’s dynamics over time — is pervasive in the systems we study.

Our earliest encounter with this problem came in emergency response dispatch (ICCPS 2019), where we built an online decision-theoretic pipeline for the Nashville Fire Department. The system needed to predict where incidents would occur and dispatch responders accordingly — but incident patterns shift with weather, time of day, road conditions, and special events. A model trained on last month’s data could be dangerously wrong today. We addressed this by making the entire pipeline online: incident prediction models based on survival analysis were continuously updated with streaming data, and the dispatch planner (formulated as a semi-Markov decision process) incorporated these updated predictions in real time. This work demonstrated that the gap between offline models and reality was the primary bottleneck — not the planning algorithm itself.

The same non-stationarity challenge appeared in public transit when we partnered with Nashville’s WeGo and Chattanooga’s CARTA. Transit dispatch involves positioning reserve vehicles to cover anticipated disruptions — but disruption patterns change with construction, detours, driver shortages, and seasonal ridership shifts. Our ICCPS 2024 work formulated this as an online stationing and dispatch problem, using Monte Carlo Tree Search with generative models that sample possible futures. The key insight was that the generative models themselves must be updated as conditions change: a demand model from before a route change will produce misleading future scenarios. The system won the Best Paper Award at ICCPS 2024, validating that principled online adaptation to changing transit conditions could outperform the ad-hoc dispatch strategies that agencies had relied on for decades.

More broadly, vehicle routing problems are inherently non-stationary. Traffic conditions, rider demand, and vehicle availability all fluctuate continuously. Our work on paratransit and microtransit routing confronts this directly: an offline route plan computed in the morning becomes suboptimal as real-time bookings, cancellations, and traffic updates arrive throughout the day. The offline-online hybrid approach we developed for paratransit computes a baseline schedule, then continuously adapts it as conditions evolve — a practical embodiment of the non-stationarity problem applied to daily transit operations.

Benchmarking for Non-Stationary Decision-Making

A recurring obstacle in this field is the lack of standardized evaluation. Most reinforcement learning and planning benchmarks assume stationary environments — the dynamics never change. This makes it difficult to compare algorithms on their ability to adapt, which is the core skill needed for real-world deployment.

Our NS-Gym toolkit (NeurIPS 2025) addresses this gap. NS-Gym is an open-source benchmarking framework, built on the widely-used Gymnasium interface, that provides standardized non-stationary environments where reward structures and transition dynamics change over time according to configurable patterns. The toolkit separates the evolution of environmental parameters from the agent’s decision-making module, enabling researchers to systematically vary the type, frequency, and magnitude of environmental changes and measure how different algorithms respond. We benchmark six algorithmic approaches from prior work and provide standardized metrics for evaluating adaptability and robustness. NS-Gym is, to our knowledge, the first comprehensive toolkit designed explicitly for non-stationary MDPs.

The Connection to Planning

Partial observability and non-stationarity are not isolated problems — they are upstream challenges that directly determine whether planning algorithms succeed or fail in practice. The belief representations developed through ESCORT and AIROAS feed into the neuro-symbolic planners described in our planning research. The non-stationarity detection and adaptation mechanisms inform when learned policies should be trusted versus when online search should take over (as in PA-MCTS and ADA-MCTS). And the benchmarks provided by NS-Gym give us rigorous ways to evaluate whether our planning algorithms actually handle changing environments or merely perform well on static test problems.

Together, this body of work ensures that the entire decision-making pipeline — from raw observations through state estimation through planning through execution — is robust to the messiness of the real world.

Selected Publications:

Y. Zhang, B. Luo, A. Mukhopadhyay, G. Karsai, and A. Dubey, ESCORT: Efficient Stein-variational and Sliced Consistency-Optimized Temporal Belief Representation for POMDPs, in Proceeding of the 39th Conference on Neural Information Processing Systems (NeurIPS’25), 2025.

Summary PDF

@inproceedings{zhang2025escortefficientsteinvariationalsliced,
  title = {ESCORT: Efficient Stein-variational and Sliced Consistency-Optimized Temporal Belief Representation for POMDPs},
  author = {Zhang, Yunuo and Luo, Baiting and Mukhopadhyay, Ayan and Karsai, Gabor and Dubey, Abhishek},
  booktitle = {Proceeding of the 39th Conference on Neural Information Processing Systems (NeurIPS'25)},
  year = {2025},
  url = {https://arxiv.org/abs/2510.21107},
  eprint = {2510.21107},
  archiveprefix = {arXiv},
  primaryclass = {cs.LG},
  what = {ESCORT is a particle-based framework for belief approximation in partially observable Markov decision processes that addresses the challenge of representing complex, multi-modal belief distributions in high-dimensional spaces. The approach extends Stein Variational Gradient Descent with correlation-aware projections and temporal consistency constraints, enabling particles to concentrate in high-uncertainty regions while preserving learned correlation structures. ESCORT dynamically adapts to belief landscape complexity without requiring resampling, maintaining both representational accuracy and computational efficiency.},
  why = {Traditional belief representation methods in POMDPs struggle with high-dimensional, multi-modal distributions due to kernel degeneracy and the need for excessive particles. Recent neural and parametric approaches fail to capture intricate correlation patterns essential for accurate decision-making. ESCORT is innovative because it combines principled geometric methods (sliced Wasserstein distance) with temporal consistency regularization, enabling particle-based methods to scale to complex belief spaces while preserving critical statistical dependencies that impact decision quality.},
  results = {Extensive evaluation on Light-Dark Navigation, Kidnapped Robot, and Multi-Target Tracking benchmarks demonstrates that ESCORT consistently outperforms state-of-the-art belief approximation methods including transformers and density-based approaches. The framework achieves superior belief fidelity and decision quality across domains ranging from discrete to continuous high-dimensional problems.},
  keywords = {partially observable Markov decision processes, belief representation, particle filtering, Stein variational gradient descent, optimal transport, stochastic optimization},
  project_tags = {POMDP, scalable AI, planning}
}

Y. Zhang, B. Luo, A. Mukhopadhyay, and A. Dubey, Observation Adaptation via Annealed Importance Resampling for Partially Observable Markov Decision Processes, Proceedings of the International Conference on Automated Planning and Scheduling, vol. 35, no. 1, pp. 306–314, Sept. 2025.

Summary DOI PDF

@article{airoas_zhang,
  title = {Observation Adaptation via Annealed Importance Resampling for Partially Observable Markov Decision Processes},
  author = {Zhang, Yunuo and Luo, Baiting and Mukhopadhyay, Ayan and Dubey, Abhishek},
  year = {2025},
  month = sep,
  journal = {Proceedings of the International Conference on Automated Planning and Scheduling},
  volume = {35},
  number = {1},
  pages = {306--314},
  doi = {10.1609/icaps.v35i1.36132},
  url = {https://ojs.aaai.org/index.php/ICAPS/article/view/36132},
  abstractnote = {Partially observable Markov decision processes (POMDPs) are a general mathematical model for sequential decision-making in stochastic environments under state uncertainty. POMDPs are often solved online, which enables the algorithm to adapt to new information in real time. Online solvers typically use bootstrap particle filters based on importance resampling for updating the belief distribution. Since directly sampling from the ideal state distribution given the latest observation and previous state is infeasible, particle filters approximate the posterior belief distribution by propagating states and adjusting weights through prediction and resampling steps. However, in practice, the importance resampling technique often leads to particle degeneracy and sample impoverishment when the state transition model poorly aligns with the posterior belief distribution, especially when the received observation is noisy. We propose an approach that constructs a sequence of bridge distributions between the state-transition and optimal distributions through iterative Monte Carlo steps, better accommodating noisy observations in online POMDP solvers. Our algorithm demonstrates significantly superior performance compared to state-of-the-art methods when evaluated across multiple challenging POMDP domains.},
  what = {AIROAS introduces Annealed Importance Resampling for Observation Adaptation in online POMDP planning, addressing the challenge of belief state representation when direct sampling from optimal posterior distributions is infeasible. The approach maintains particle diversity through annealed importance resampling, creating smoothly interpolated intermediate distributions that bridge proposal and target distributions. This enables more efficient belief updates and superior planning performance compared to standard particle filtering approaches, particularly in deep searches where observation uncertainty is high.},
  why = {Online POMDP planning requires accurate belief state representation to make decisions under uncertainty, but particle filtering struggles when the received observation provides highly informative evidence that moves beliefs far from prior distributions. AIROAS is innovative because it applies importance sampling tempering principles specifically for belief node updates in planning trees, improving upon standard particle filters by carefully controlling the transition between prior and posterior beliefs through sequence of intermediate distributions.},
  results = {Experimental evaluation across multiple POMDP planning domains demonstrates that AIROAS significantly improves planning performance by reducing particle degeneracy issues at deeper search nodes. The approach enables more effective belief representation while maintaining computational efficiency in online planning, achieving better decision quality compared to standard particle filtering approaches.},
  keywords = {partially observable Markov decision processes, importance sampling, particle filtering, belief state representation, online planning, Monte Carlo methods},
  project_tags = {POMDP, planning, scalable AI}
}

N. S. Keplinger, B. Luo, I. Bektas, Y. Zhang, K. H. Wray, A. Laszka, A. Dubey, and A. Mukhopadhyay, NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes, in Proceeding of the 39th Conference on Neural Information Processing Systems (NeurIPS’25), 2025.

Summary PDF

@inproceedings{keplinger2025nsgym,
  author = {Keplinger, Nathaniel S. and Luo, Baiting and Bektas, Iliyas and Zhang, Yunuo and Wray, Kyle Hollins and Laszka, Aron and Dubey, Abhishek and Mukhopadhyay, Ayan},
  title = {NS-Gym: Open-Source Simulation Environments and Benchmarks for Non-Stationary Markov Decision Processes},
  year = {2025},
  booktitle = {Proceeding of the 39th Conference on Neural Information Processing Systems (NeurIPS'25)},
  archiveprefix = {arXiv},
  contribution = {colab},
  eprint = {2501.09646},
  primaryclass = {cs.AI},
  url = {https://arxiv.org/abs/2501.09646},
  what = {NS-Gym is an open-source simulation toolkit for non-stationary Markov decision processes that segregates environmental parameter evolution from agent decision-making. The toolkit provides standardized interfaces for defining NS-MDPs, benchmark problems across different environmental change types, and implementations of state-of-the-art algorithmic approaches. NS-Gym enables systematic evaluation of decision-making algorithms under dynamic environments, addressing the gap in standardized benchmarks for non-stationary problems in fields like autonomous driving and resource optimization.},
  why = {Many real-world decision-making problems involve non-stationary environments where the reward structure or transition dynamics change over time, yet most research assumes stationary conditions. The lack of standardized benchmarks and simulation interfaces has hindered systematic progress in non-stationary decision-making. NS-Gym is innovative because it provides the first comprehensive toolkit specifically designed for NS-MDPs, enabling researchers to evaluate algorithm robustness and adaptability under realistic environmental change patterns.},
  results = {Benchmark results comparing six algorithmic approaches across multiple NS-MDP problem types demonstrate clear performance differences in handling environmental changes. The toolkit enables reproducible evaluation of both model-based and model-free approaches under various environmental conditions.},
  keywords = {non-stationary environments, Markov decision processes, benchmark problems, decision-making under change, algorithm evaluation, reinforcement learning},
  project_tags = {POMDP, planning, scalable AI}
}

In many real-world applications, agents must make sequential decisions in environments where conditions are subject to change due to various exogenous factors. These non-stationary environments pose significant challenges to traditional decision-making models, which typically assume stationary dynamics. Non-stationary Markov decision processes (NS-MDPs) offer a framework to model and solve decision problems under such changing conditions. However, the lack of standardized benchmarks and simulation tools has hindered systematic evaluation and advance in this field. We present NS-Gym, the first simulation toolkit designed explicitly for NS-MDPs, integrated within the popular Gymnasium framework. In NS-Gym, we segregate the evolution of the environmental parameters that characterize non-stationarity from the agent’s decision-making module, allowing for modular and flexible adaptations to dynamic environments. We review prior work in this domain and present a toolkit encapsulating key problem characteristics and types in NS-MDPs. This toolkit is the first effort to develop a set of standardized interfaces and benchmark problems to enable consistent and reproducible evaluation of algorithms under non-stationary conditions. We also benchmark six algorithmic approaches from prior work on NS-MDPs using NS-Gym. Our vision is that NS-Gym will enable researchers to assess the adaptability and robustness of their decision-making algorithms to non-stationary conditions.

Y. Zhang, B. Luo, A. Mukhopadhyay, D. Stojcsics, D. Elenius, A. Roy, S. Jha, M. Maroti, X. Koutsoukos, G. Karsai, and A. Dubey, Shrinking POMCP: A Framework for Real-Time UAV Search and Rescue, in 2024 International Conference on Assured Autonomy (ICAA), 2024, pp. 48–57.

Summary DOI PDF

@inproceedings{zhang2024,
  author = {Zhang, Yunuo and Luo, Baiting and Mukhopadhyay, Ayan and Stojcsics, Daniel and Elenius, Daniel and Roy, Anirban and Jha, Susmit and Maroti, Miklos and Koutsoukos, Xenofon and Karsai, Gabor and Dubey, Abhishek},
  booktitle = {2024 International Conference on Assured Autonomy (ICAA)},
  title = {Shrinking POMCP: A Framework for Real-Time UAV Search and Rescue},
  year = {2024},
  pages = {48--57},
  contribution = {lead},
  doi = {10.1109/ICAA64256.2024.00016},
  keywords = {path planning, search and rescue, partial observability, Monte Carlo tree search, autonomous systems, UAV operations, planning under uncertainty, belief state management},
  what = {This paper presents a framework for efficient path planning and search in urban environments with uncertain information about target locations and environmental hazards. The approach formulates the problem as a partially observable Markov decision process and develops the Shrinking POMCP algorithm that reduces computational complexity by focusing search on promising regions of the state space. The method combines belief state updates with efficient search tree planning, allowing autonomous systems to locate targets while managing uncertainty in a real-time setting.},
  why = {Search and rescue operations require autonomous systems to make decisions under significant uncertainty about target locations and environmental conditions. Traditional planning approaches either require perfect information or are computationally intractable for large environments. This work is innovative because it provides a scalable approach to planning under partial observability by intelligently focusing search resources on regions likely to contain targets, while maintaining the ability to adapt to new information discovered during execution.},
  results = {The Shrinking POMCP algorithm demonstrates substantial improvements in computational efficiency compared to standard POMCP approaches while maintaining solution quality. Experimental validation in simulated environments shows that the approach successfully localizes targets while minimizing search time and computational resources. The method proves effective at balancing the need for comprehensive environment coverage with the constraint of limited planning time.},
  project_tags = {emergency, POMDP, scalable AI}
}

Efficient path optimization for drones in search and rescue operations faces challenges, including limited visibility, time constraints, and complex information gathering in urban environments. We present a comprehensive approach to optimize UAV-based search and rescue operations in neighborhood areas, utilizing both a 3D AirSim-ROS2 simulator and a 2D simulator. The path planning problem is formulated as a partially observable Markov decision process (POMDP), and we propose a novel "Shrinking POMCP" approach to address time constraints. In the AirSim environment, we integrate our approach with a probabilistic world model for belief maintenance and a neurosymbolic navigator for obstacle avoidance. The 2D simulator employs surrogate ROS2 nodes with equivalent functionality. We compare trajectories generated by different approaches in the 2D simulator and evaluate performance across various belief types in the 3D AirSim-ROS simulator. Experimental results from both simulators demonstrate that our proposed shrinking POMCP solution achieves significant improvements in search times compared to alternative methods, showcasing its potential for enhancing the efficiency of UAV-assisted search and rescue operations.

A. Mukhopadhyay, G. Pettet, C. Samal, A. Dubey, and Y. Vorobeychik, An online decision-theoretic pipeline for responder dispatch, in Proceedings of the 10th ACM/IEEE International Conference on Cyber-Physical Systems, ICCPS 2019, Montreal, QC, Canada, 2019, pp. 185–196.

Summary DOI PDF

@inproceedings{Mukhopadhyay2019,
  author = {Mukhopadhyay, Ayan and Pettet, Geoffrey and Samal, Chinmaya and Dubey, Abhishek and Vorobeychik, Yevgeniy},
  booktitle = {Proceedings of the 10th {ACM/IEEE} International Conference on Cyber-Physical Systems, {ICCPS} 2019, Montreal, QC, Canada},
  title = {An online decision-theoretic pipeline for responder dispatch},
  year = {2019},
  pages = {185--196},
  bibsource = {dblp computer science bibliography, https://dblp.org},
  biburl = {https://dblp.org/rec/bib/conf/iccps/MukhopadhyayPSD19},
  category = {selectiveconference},
  contribution = {lead},
  acceptance = {27},
  doi = {10.1145/3302509.3311055},
  file = {:Mukhopadhyay2019-An_Online_Decision_Theoretic_Pipeline_for_Responder_Dispatch.pdf:PDF},
  keywords = {emergency response, responder dispatch, decision-theoretic planning, SMDP, incident prediction, survival analysis},
  project = {smart-cities,smart-emergency-response},
  tag = {ai4cps,incident},
  timestamp = {Sun, 07 Apr 2019 16:25:36 +0200},
  url = {https://doi.org/10.1145/3302509.3311055},
  what = {This paper presents an online decision-theoretic pipeline for responder dispatch in emergency management systems. The work formulates the responder dispatch problem as a Semi-Markov Decision Process and develops an online incident prediction model based on survival analysis to enable real-time, data-driven dispatch decisions.},
  why = {Emergency response systems face complex challenges in routing limited resources to incidents in dynamic urban environments. Traditional systems dispatch the nearest responder, which ignores future incident probabilities and environmental factors. This paper addresses these limitations through a principled decision-theoretic approach that integrates incident prediction with dynamic dispatch optimization.},
  results = {The paper demonstrates the effectiveness of the approach through evaluation on real emergency services data from Nashville, Tennessee. The online prediction and dispatch pipeline reduces response times compared to baseline approaches while accounting for incident cascading effects and changing environmental dynamics. The work successfully bridges incident prediction and optimal dispatch decisions.},
  project_tags = {emergency, POMDP, planning, scalable AI}
}

The problem of dispatching emergency responders to service traffic accidents, fire, distress calls and crimes plagues urban areas across the globe. While such problems have been extensively looked at, most approaches are offline. Such methodologies fail to capture the dynamically changing environments under which critical emergency response occurs, and therefore, fail to be implemented in practice. Any holistic approach towards creating a pipeline for effective emergency response must also look at other challenges that it subsumes - predicting when and where incidents happen and understanding the changing environmental dynamics. We describe a system that collectively deals with all these problems in an online manner, meaning that the models get updated with streaming data sources. We highlight why such an approach is crucial to the effectiveness of emergency response, and present an algorithmic framework that can compute promising actions for a given decision-theoretic model for responder dispatch. We argue that carefully crafted heuristic measures can balance the trade-off between computational time and the quality of solutions achieved and highlight why such an approach is more scalable and tractable than traditional approaches. We also present an online mechanism for incident prediction, as well as an approach based on recurrent neural networks for learning and predicting environmental features that affect responder dispatch. We compare our methodology with prior state-of-the-art and existing dispatch strategies in the field, which show that our approach results in a reduction in response time with a drastic reduction in computational time.

A. Pettet, Y. Zhang, B. Luo, K. Wray, H. Baier, A. Laszka, A. Dubey, and A. Mukhopadhyay, Decision Making in Non-Stationary Environments with Policy-Augmented Search, in Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems, Richland, SC, 2024, pp. 2417–2419.

Summary PDF

@inproceedings{pettet2024decision,
  author = {Pettet, Ava and Zhang, Yunuo and Luo, Baiting and Wray, Kyle and Baier, Hendrik and Laszka, Aron and Dubey, Abhishek and Mukhopadhyay, Ayan},
  booktitle = {Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems},
  title = {Decision Making in Non-Stationary Environments with Policy-Augmented Search},
  year = {2024},
  address = {Richland, SC},
  acceptance = {36},
  pages = {2417–2419},
  publisher = {International Foundation for Autonomous Agents and Multiagent Systems},
  series = {AAMAS '24},
  contribution = {lead},
  isbn = {9798400704864},
  note = {extended abstract},
  keywords = {non-stationary MDPs, policy learning, Monte Carlo tree search, sequential decision-making, online planning, offline learning, policy augmentation},
  location = {Auckland, New Zealand},
  numpages = {3},
  what = {This paper develops a policy-augmented Monte Carlo tree search framework for making decisions in non-stationary environments where the agent's policy may need to be updated as conditions change. The approach combines offline policy learning using Q-values learned in a previous environment with online MCTS planning to handle the case where environment dynamics have shifted. The method includes theoretical analysis showing conditions under which combining the learned policy with online search ensures the algorithm selects optimal or near-optimal actions.},
  why = {Many real-world decision-making systems face the challenge that environmental conditions change over time, making previously learned policies suboptimal. This work is innovative because it provides theoretical guarantees about when and how to combine learned policies with online planning to maintain performance despite environmental changes. The approach is elegant and applicable to a wide range of domains, from emergency response to transportation, where the policy learned in one setting may not be optimal when conditions evolve.},
  results = {The theoretical analysis provides conditions under which the policy-augmented approach is guaranteed to select optimal actions, with bounds on the error incurred when policies are updated. Experimental validation on classic control tasks shows that the approach achieves robust performance superior to either pure offline learning or pure online planning when facing non-stationary environments. The method successfully balances the speed of learned policies with the adaptability of online search.},
  project_tags = {POMDP, scalable AI}
}

Sequential decision-making is challenging in non-stationary environments, where the environment in which an agent operates can change over time. Policies learned before execution become stale when the environment changes, and relearning takes time and computational effort. Online search, on the other hand, can return sub-optimal actions when there are limitations on allowed runtime. In this paper, we introduce Policy-Augmented Monte Carlo tree search (PA-MCTS), which combines action-value estimates from an out-of-date policy with an online search using an up-to-date model of the environment. We prove several theoretical results about PA-MCTS. We also compare and contrast our approach with AlphaZero, another hybrid planning approach, and Deep Q Learning on several OpenAI Gym environments and show that PA-MCTS outperforms these baselines.

J. P. Talusan, C. Han, A. Mukhopadhyay, A. Laszka, D. Freudberg, and A. Dubey, An Online Approach to Solving Public Transit Stationing and Dispatch Problem, in Proceedings of the ACM/IEEE 15th International Conference on Cyber-Physical Systems (ICCPS), New York, NY, USA, 2024.

Summary PDF

@inproceedings{talusan2024ICCPS,
  author = {Talusan, Jose Paolo and Han, Chaeeun and Mukhopadhyay, Ayan and Laszka, Aron and Freudberg, Dan and Dubey, Abhishek},
  booktitle = {Proceedings of the ACM/IEEE 15th International Conference on Cyber-Physical Systems (ICCPS)},
  title = {An Online Approach to Solving Public Transit Stationing and Dispatch Problem},
  year = {2024},
  address = {New York, NY, USA},
  publisher = {Association for Computing Machinery},
  series = {ICCPS '24},
  contribution = {lead},
  note = {Best paper award},
  acceptance = {28.2},
  location = {Hong Kong, China},
  numpages = {10},
  what = {This work develops a software framework for public transit stoning and dispatch that solves the problem of optimally assigning substitute buses when the fixed-line fleet experiences disruptions. The system models the problem as a semi-Markov decision process and uses Monte Carlo tree search to find good dispatching decisions. The approach includes both offline optimization for planned scheduling and online components for responding to real-time disruptions, with integration into a complete transit management system.},
  why = {When transit buses break down or experience incidents, agencies must quickly decide which substitute vehicles to dispatch to cover affected trips. This decision-making problem combines aspects of scheduling, resource allocation, and real-time optimization. The work is important because it addresses the practical challenge of making good decisions under uncertainty with limited time and information, using both planning and learning techniques to balance the need for speed with solution quality.},
  results = {The MCTS-based approach successfully solves the stoning and dispatch problem for real transit instances, outperforming greedy baseline approaches. The system demonstrates the ability to handle both pre-planned scheduling for known trip patterns and dynamic reallocation when disruptions occur. Results show how tree search methods can effectively explore the space of alternative dispatching strategies to find solutions that minimize passenger impact.},
  keywords = {transit dispatch, vehicle routing, disruption response, online optimization, Monte Carlo tree search, resource allocation, real-time decision-making},
  project_tags = {transit, emergency, POMDP, middleware}
}

Public bus transit systems provide critical transportation services for large sections of modern communities. On-time performance and maintaining the reliable quality of service is therefore very important. Unfortunately, disruptions caused by overcrowding, vehicular failures, and road accidents often lead to service performance degradation. Though transit agencies keep a limited number of vehicles in reserve and dispatch them to relieve the affected routes during disruptions, the procedure is often ad-hoc and has to rely on human experience and intuition to allocate resources (vehicles) to affected trips under uncertainty. In this paper, we describe a principled approach using non-myopic sequential decision procedures to solve the problem and decide (a) if it is advantageous to anticipate problems and proactively station transit buses near areas with high-likelihood of disruptions and (b) decide if and which vehicle to dispatch to a particular problem. Our approach was developed in partnership with the Metropolitan Transportation Authority for a mid-sized city in the USA and models the system as a semi-Markov decision problem (solved as a Monte-Carlo tree search procedure) and shows that it is possible to obtain an answer to these two coupled decision problems in a way that maximizes the overall reward (number of people served). We sample many possible futures from generative models, each is assigned to a tree and processed using root parallelization. We validate our approach using 3 years of data from our partner agency. Our experiments show that the proposed framework serves 2% more passengers while reducing deadhead miles by 40%.

A. Sivagnanam, S. U. Kadir, A. Mukhopadhyay, P. Pugliese, A. Dubey, S. Samaranayake, and A. Laszka, Offline Vehicle Routing Problem with Online Bookings: A Novel Problem Formulation with Applications to Paratransit, in 31st International Joint Conference on Artificial Intelligence (IJCAI), 2022.

Summary Preprint PDF

@inproceedings{sivagnanam2022offline,
  author = {Sivagnanam, Amutheezan and Kadir, Salah Uddin and Mukhopadhyay, Ayan and Pugliese, Philip and Dubey, Abhishek and Samaranayake, Samitha and Laszka, Aron},
  booktitle = {31st International Joint Conference on Artificial Intelligence (IJCAI)},
  title = {Offline Vehicle Routing Problem with Online Bookings: A Novel Problem Formulation with Applications to Paratransit},
  year = {2022},
  acceptance = {15},
  month = jul,
  contribution = {colab},
  preprint = {https://arxiv.org/abs/2204.11992},
  what = {This work addresses the offline vehicle routing problem with online bookings for paratransit services, where pickup windows are selected at the time of booking rather than predetermined. The authors propose a formulation combining an offline vehicle routing model with an online bookings model, and present computational approaches including an anytime algorithm with reinforcement learning and a Markov decision process formulation.},
  why = {Paratransit services for elderly and disabled passengers require high flexibility in response to real-time requests while maintaining operational efficiency. This work is novel because it bridges the gap between offline and online routing problems with practical constraints on pickup windows. The combination of optimization and learning approaches enables the system to adapt to dynamic demand while respecting the transportation agency's operational requirements.},
  results = {The proposed methods were evaluated using real-world paratransit data from Chattanooga, showing that the anytime algorithm with learning outperforms baseline approaches. The reinforcement learning approach effectively learns policies that balance responsiveness to immediate requests with long-term efficiency considerations. The experimental results demonstrate significant improvements in cost reduction and robustness when environmental conditions change dynamically.},
  keywords = {vehicle routing, online optimization, paratransit services, reinforcement learning, demand-responsive transport},
  project_tags = {transit, planning, scalable AI, POMDP}
}

Vehicle routing problems (VRPs) can be divided into two major categories: offline VRPs, which consider a given set of trip requests to be served, and online VRPs, which consider requests as they arrive in real-time. Based on discussions with public transit agencies, we identify a real-world problem that is not addressed by existing formulations: booking trips with flexible pickup windows (e.g., 3 hours) in advance (e.g., the day before) and confirming tight pickup windows (e.g., 30 minutes) at the time of booking. Such a service model is often required in paratransit service settings, where passengers typically book trips for the next day over the phone. To address this gap between offline and online problems, we introduce a novel formulation, the offline vehicle routing problem with online bookings. This problem is very challenging computationally since it faces the complexity of considering large sets of requests—similar to offline VRPs—but must abide by strict constraints on running time—similar to online VRPs. To solve this problem, we propose a novel computational approach, which combines an anytime algorithm with a learning-based policy for real-time decisions. Based on a paratransit dataset obtained from our partner transit agency, we demonstrate that our novel formulation and computational approach lead to significantly better outcomes in this service setting than existing algorithms.

Connected Use-Inspired Projects

These foundational methods power our use-inspired projects:

Transforming Mobility with AI-Powered Decision Making

AI-Driven Mobility & Public Transit

AI for Faster, Smarter Emergency Response

Emergency Response & Public Safety

Resilient and Intelligent Energy Systems

Smart Grid Resilience & Vehicle-Grid Integration