Non-myopic planning that fuses neural perception with symbolic reasoning and formal assurances — fast, safe, and far-sighted decisions at scale.
Cities run on sequential decisions made under uncertainty. A transit agency must assign vehicles to ride requests as they arrive, without knowing what requests come next. An emergency dispatcher must position responders across a city in anticipation of incidents that haven’t happened yet. An EV fleet coordinator must schedule charging and discharging over a multi-day horizon while electricity prices fluctuate. In each case, the decision-maker faces the same core challenge: act now, with incomplete information, in a way that accounts for what might happen later.
The standard approaches to this problem each have a critical weakness. Reinforcement learning (RL) trains a policy offline through simulation, producing fast decisions at runtime — but the policy becomes stale when conditions change (a bus breaks down, demand shifts, a new route is added), and retraining is expensive. Online planning methods like Monte Carlo Tree Search (MCTS) adapt in real time by searching over possible futures, but they start from scratch at every decision point, ignoring everything learned from past experience.
Our research asks: can we combine the strengths of both?
The core idea behind our work is a hybrid neuro-symbolic architecture that integrates learned policies with online search. Rather than choosing between offline learning and online planning, we use them together: neural networks handle perception, prediction, and long-horizon value estimation, while symbolic tree search handles constraint enforcement, structured exploration, and real-time adaptation.
The key insight, inspired by how AlphaZero combined deep learning with MCTS for game-playing, is that a previously learned policy — even a stale one — still carries useful information. A policy trained on thousands of simulated episodes encodes knowledge about which actions tend to be good in which states. When the environment changes, this knowledge doesn’t become worthless; it becomes an informed starting point. Our Policy-Augmented MCTS (PA-MCTS) uses the learned policy’s value estimates to guide the tree search, focusing computation on the most promising branches rather than exploring blindly. The policy accelerates convergence; the search provides adaptability.
This hybrid approach addresses a fundamental tension in real-world decision-making: the environment is non-stationary — conditions change continuously — but we cannot afford to relearn everything from scratch each time. Our framework formalizes this through the concept of Q-bounded non-stationary MDPs, which bounds how much the value of actions can shift over time and uses these bounds to decide when the learned policy can be trusted and when the online search should take over.
Three innovations distinguish our approach from prior work in online planning.
Adaptive learning during planning. Most planners treat the environment model as fixed during search. Our ADA-MCTS algorithm (Adaptive Anytime Decentralized MCTS) learns about changes in the environment as it plans. It separates what it doesn’t know because the world is inherently random (aleatoric uncertainty) from what it doesn’t know because it hasn’t explored enough (epistemic uncertainty). In regions of the state space where the agent has gathered updated knowledge, it acts optimistically; where knowledge is lacking, it hedges. This “act as you learn” principle allows the planner to be adaptive without being reckless.
Scalability through learned temporal abstraction. Tree search suffers from the curse of dimensionality — continuous state and action spaces create enormous branching factors. Our L-MAP framework addresses this by training a neural network (VQ-VAE) to compress system dynamics into discrete latent codes that represent “macro-actions,” collapsing many fine-grained steps into single abstract decisions. Symbolic MCTS then searches efficiently in this compressed space. The result is orders-of-magnitude speedup while maintaining solution quality — enabling planning in domains that were previously intractable for tree search.
Persistent value functions for long-horizon reasoning. Many real-world objectives (reducing peak energy costs over a month, minimizing total passenger wait time over a day) require reasoning over horizons far longer than any single planning episode. Our Persistent V2B work introduces neural value functions that estimate cost-to-go over multi-day windows, giving the symbolic planner a sense of long-term consequence that pure myopic search cannot achieve.
Public transit is where our planning framework has seen its deepest deployment and validation. Transit agencies face a particularly clear version of the online planning problem: vehicles must be assigned to trips, dispatched to disruptions, and repositioned proactively — all in real time, all under uncertainty about future demand and road conditions.
Working with Nashville’s WeGo Public Transit and Chattanooga’s CARTA, we developed online planners that treat the dispatch problem as a semi-Markov decision process solved via MCTS. The planner samples many possible futures from generative demand models, evaluates them in parallel, and produces dispatch recommendations that maximize the number of passengers served. In a three-year validation study, our approach served 2% more passengers while reducing deadhead (empty driving) miles by 40% — a significant operational improvement.
We have also addressed the broader challenge of mobility-on-demand, building complete systems for microtransit and paratransit operations that combine vehicle routing, real-time dispatch, and passenger scheduling into a unified decision pipeline. For paratransit specifically, we developed an offline-online hybrid: the system solves an offline vehicle routing problem to create baseline schedules, then adapts online as new bookings and cancellations arrive — an approach validated on real paratransit operations that reduced empty vehicle runs by 50%.
The same planning architecture extends naturally to other domains. For vehicle-to-building (V2B) energy management, we developed an RL framework that combines deep deterministic policy gradient methods with constraint-aware action masking to optimize charging and discharging of EV fleets at office buildings. This work, nominated for best paper at AAMAS 2025, demonstrated significant cost savings on real-world data from a major EV manufacturer. Building on this, our neuro-symbolic control approach integrates persistent value functions with symbolic constraint enforcement to coordinate EV batteries as energy reservoirs — reducing peak demand while guaranteeing that every vehicle meets its departure charge requirements.
For emergency response, our multi-agent planning algorithms coordinate the stationing and dispatch of fire engines, ambulances, and police units across metropolitan areas, using neural demand models to anticipate where incidents will occur and symbolic constraints to ensure minimum coverage is never violated.
Our neuro-symbolic planning work has been recognized at top venues: an ICLR 2025 Spotlight for L-MAP, a Best Paper Award at ICCPS 2024 for our transit dispatch planner, and a Best Paper Nomination at AAMAS 2025 for V2B energy management. The algorithms have been piloted in real transit operations in Nashville and Chattanooga, and the framework has been applied across transit, energy, and emergency response — demonstrating that a single principled approach to combining learning with search can address sequential decision-making challenges across multiple societal-scale domains.
Selected Publications:
@inproceedings{iccps2026_pv2b,
author = {Sen, Rishav and Liu, Fangqi and Talusan, Jose Paolo and Pettet, Ava and Suzue, Yoshinori and Mukhopadhyay, Ayan and Dubey, Abhishek},
title = {Persistent Vehicle-to-Building Integration via Neuro-Symbolic Control},
year = {2026},
booktitle = {Proceedings of the HSCC/ICCPS 2026: 29th ACM International Conference on Hybrid Systems: Computation and Control and 17th ACM/IEEE International Conference on Cyber-Physical Systems},
location = {Saint Malo, France},
keywords = {vehicle-to-building, EV charging, demand charge management, user persistence, neuro-symbolic control, Monte Carlo tree search, model predictive control},
note = {Acceptance rate: 28\%; Regular Paper; Track: Systems and Applications},
series = {HSCC/ICCPS '26},
what = {P-V2B introduces a neuro-symbolic framework for vehicle-to-building charging that incorporates user persistence information alongside technical optimization. The work addresses the persistent user problem where electric vehicles exhibit recurring arrival patterns over time at buildings, enabling buildings to anticipate charging demand and schedule charging strategically. The approach combines a neuro-symbolic control framework integrating Monte Carlo Model Predictive Control with a learned value function to handle both short-horizon feasibility and long-horizon demand-charge prediction, accounting for user behavior patterns while managing real-time constraints.},
why = {Vehicle-to-building systems present a complex control challenge combining real-time physical constraints with long-horizon stochastic effects of user behavior, where traditional decomposition approaches fail to capture crucial dependencies. The innovation lies in explicitly leveraging user persistence—the observation that EV users exhibit recurring patterns—as a key input alongside technical constraints, enabling more intelligent demand charge management. This bridges control theory and behavioral modeling, providing a principled way to incorporate user behavioral patterns into cyber-physical system optimization.},
results = {Evaluation on real EV fleet data from a major California manufacturer demonstrates substantial improvements in demand charge reduction and total operating costs compared to both heuristic baselines and prior work that ignore user persistence. The neuro-symbolic framework achieves significant cost savings while ensuring feasibility and full compliance with user charging requirements, validating the effectiveness of persistence-aware control strategies.},
project_tags = {energy, CPS, planning}
}
Vehicle-to-Building (V2B) integration is a cyber–physical system (CPS) where Electric Vehicles (EVs) enhance building resilience by serving as mobile storage for peak shaving, reducing monthly peak-power demand charges, supporting grid stability, and lowering electricity costs. We introduce the Persistent Vehicle-to-Building (P-V2B) problem, a long-horizon formulation that incorporates user-level persistence, where each EV corresponds to a consistent user identity across days. This structure captures recurring arrival patterns and travel-related external energy use, common in employee-based facilities with regular commuting behavior. Persistence enables multi-day strategies that are unattainable in single-day formulations, such as over-charging on low-demand days to support discharging during future high-demand periods. Real-time decision making in this CPS setting presents three key challenges: (i) uncertainty in long-term EV behavior and building load forecasts, which causes traditional control and heuristic methods to degrade under stochastic conditions; (ii) inter-day coupling of decisions and rewards, where early actions affect downstream feasible charging and discharging opportunities, complicating long-horizon optimization; and (iii) high-dimensional continuous action spaces, which exacerbate the curse of dimensionality in reinforcement learning (RL) and search-based approaches. To address these challenges, we propose a neuro-symbolic framework that integrates a constraint-based Monte Carlo Model Predictive Control (MC-MPC) layer with a learned Value Function (VF). The MC–MPC enforces physical feasibility and manages environmental uncertainty, while the VF provides long-term strategic foresight. Evaluations using real building and EV fleet data from an EV manufacturer in California demonstrate that the hybrid framework substantially outperforms state-of-the-art baselines, significantly reducing demand charge and total energy costs, while ensuring feasibility and full compliance with user charging requirements.
@inproceedings{liu2024reinforcement,
author = {Liu, Fangqi and Sen, Rishav and Talusan, Jose and Pettet, Ava and Kandel, Aaron and Suzue, Yoshinori and Mukhopadhyay, Ayan and Dubey, Abhishek},
booktitle = {Proceedings of the 23rd Conference on Autonomous Agents and MultiAgent Systems, {AAMAS} 2025, Detroit, Michigan},
title = {Reinforcement Learning-based Approach for Vehicle-to-Building Charging with Heterogeneous Agents and Long Term Rewards},
year = {2025},
address = {Richland, SC},
note = {nominated for best paper},
organization = {International Conference on Autonomous Agents and Multi-Agent Systems},
publisher = {International Foundation for Autonomous Agents and Multiagent Systems},
series = {AAMAS '25},
acceptance = {24.5},
category = {selective},
contribution = {lead},
location = {Detroit, Michigan},
what = {This work proposes a reinforcement learning-based approach for vehicle-to-building charging that combines Deep Deterministic Policy Gradient with action masking and policy guidance. The framework models V2B as a Markov decision process with continuous action spaces and constraints, using action masking to ensure feasibility and policy guidance to improve learning efficiency. The approach incorporates domain-specific knowledge about charging physics, building loads, and grid constraints while maintaining flexibility to adapt to new operational scenarios.},
why = {Vehicle-to-building energy management presents a high-dimensional, continuous control problem under uncertainty where traditional optimization methods struggle with real-time responsiveness and scalability. This work is innovative because it combines modern deep reinforcement learning with domain-specific constraints and knowledge, enabling scalable learning of near-optimal charging policies that naturally adapt to building dynamics and user behavior without requiring explicit model calibration.},
results = {Evaluation on real EV fleet data from a major manufacturer demonstrates significant cost savings while meeting all user charging requirements and grid constraints. The learned policies achieve substantial improvements in demand charge reduction and total operating costs compared to both heuristic baselines and model-predictive control approaches.},
keywords = {electric vehicle charging, reinforcement learning, deep deterministic policy gradient, building energy management, demand response, stochastic control},
project_tags = {energy, planning, ML for CPS}
}
Strategic aggregation of electric vehicle batteries as energy reservoirs can optimize power grid demand, benefiting smart and connected communities, especially large office buildings that offer workplace charging. This involves optimizing charging and discharging to reduce peak energy costs and net peak demand, monitored over extended periods (e.g., a month), which involves making sequential decisions under uncertainty and delayed and sparse rewards, a continuous action space, and the complexity of ensuring generalization across diverse conditions. Existing algorithmic approaches, e.g., heuristic-based strategies, fall short in addressing real-time decision-making under dynamic conditions, and traditional reinforcement learning (RL) models struggle with large stateaction spaces, multi-agent settings, and the need for long-term reward optimization. To address these challenges, we introduce a novel RL framework that combines the Deep Deterministic Policy Gradient approach (DDPG) with action masking and efficient MILP-driven policy guidance. Our approach balances the exploration of continuous action spaces to meet user charging demands. Using real-world data from a major electric vehicle manufacturer, we show that our approach comprehensively outperforms many well-established baselines and several scalable heuristic approaches, achieving significant cost savings while meeting all charging requirements. Our results show that the proposed approach is one of the first scalable and general approaches to solving the V2B energy management challenge.
@inproceedings{luo2025scalable,
author = {Luo, Baiting and Pettet, Ava and Laszka, Aron and Dubey, Abhishek and Mukhopadhyay, Ayan},
booktitle = {Proceedings of the 13th International Conference on Learning Representations, Singapore},
title = {Scalable Decision-Making In Stochastic Environments Through Learned Temporal Abstraction},
year = {2025},
organization = {International Conference on Learning Representations},
acceptance = {32.8},
category = {selective},
contribution = {colab},
what = {This paper proposes Latent Macro Action Planner, which addresses sequential decision-making in high-dimensional continuous action spaces through learned temporal abstractions. The approach uses a state-conditioned vector quantized variational autoencoder to discretize complex action sequences into manageable macro-actions, enabling efficient planning in pre-constructed latent spaces. The framework combines Monte Carlo Tree Search for planning with learned prior policies, allowing effective exploration and exploitation under both deterministic and stochastic dynamics.},
why = {Planning in high-dimensional continuous action spaces suffers from the curse of dimensionality and the curse of history, making real-time decision-making challenging even with advanced planning methods. This work is innovative because it demonstrates how learned temporal abstractions can dramatically reduce computational complexity while maintaining decision quality, enabling fast planning in complex stochastic environments. The approach bridges the gap between neural representation learning and classical planning methods.},
results = {Evaluation across diverse continuous control tasks including robotic manipulation and autonomous driving demonstrates that the approach achieves better performance with lower decision latency compared to both model-based baselines and direct RL methods. The framework scales effectively to high-dimensional problems where traditional planning becomes infeasible.},
keywords = {temporal abstraction, planning under uncertainty, continuous action spaces, latent representations, Monte Carlo tree search, reinforcement learning, stochastic dynamics},
project_tags = {scalable AI, planning, POMDP}
}
Sequential decision-making in high-dimensional continuous action spaces, particularly in stochastic environments, faces significant computational challenges. We explore this challenge in the traditional offline RL setting, where an agent must learn how to make decisions based on data collected through a stochastic behavior policy. We present Latent Macro Action Planner (L-MAP), which addresses this challenge by learning a set of temporally extended macro-actions through a state-conditional Vector Quantized Variational Autoencoder (VQ-VAE), effectively reducing action dimensionality. L-MAP employs a (separate) learned prior model that acts as a latent transition model and allows efficient sampling of plausible actions. During planning, our approach accounts for stochasticity in both the environment and the behavior policy by using Monte Carlo tree search (MCTS). In offline RL settings, including stochastic continuous control tasks, L-MAP efficiently searches over discrete latent actions to yield high expected returns. Empirical results demonstrate that L-MAP maintains low decision latency despite increased action dimensionality. Notably, across tasks ranging from continuous control with inherently stochastic dynamics to high-dimensional robotic hand manipulation, L-MAP significantly outperforms existing model-based methods and performs on par with strong model-free actor-critic baselines, highlighting the effectiveness of the proposed approach in planning in complex and stochastic environments with high-dimensional action spaces.
@inproceedings{talusan2024ICCPS,
author = {Talusan, Jose Paolo and Han, Chaeeun and Mukhopadhyay, Ayan and Laszka, Aron and Freudberg, Dan and Dubey, Abhishek},
booktitle = {Proceedings of the ACM/IEEE 15th International Conference on Cyber-Physical Systems (ICCPS)},
title = {An Online Approach to Solving Public Transit Stationing and Dispatch Problem},
year = {2024},
address = {New York, NY, USA},
publisher = {Association for Computing Machinery},
series = {ICCPS '24},
contribution = {lead},
note = {Best paper award},
acceptance = {28.2},
location = {Hong Kong, China},
numpages = {10},
what = {This work develops a software framework for public transit stoning and dispatch that solves the problem of optimally assigning substitute buses when the fixed-line fleet experiences disruptions. The system models the problem as a semi-Markov decision process and uses Monte Carlo tree search to find good dispatching decisions. The approach includes both offline optimization for planned scheduling and online components for responding to real-time disruptions, with integration into a complete transit management system.},
why = {When transit buses break down or experience incidents, agencies must quickly decide which substitute vehicles to dispatch to cover affected trips. This decision-making problem combines aspects of scheduling, resource allocation, and real-time optimization. The work is important because it addresses the practical challenge of making good decisions under uncertainty with limited time and information, using both planning and learning techniques to balance the need for speed with solution quality.},
results = {The MCTS-based approach successfully solves the stoning and dispatch problem for real transit instances, outperforming greedy baseline approaches. The system demonstrates the ability to handle both pre-planned scheduling for known trip patterns and dynamic reallocation when disruptions occur. Results show how tree search methods can effectively explore the space of alternative dispatching strategies to find solutions that minimize passenger impact.},
keywords = {transit dispatch, vehicle routing, disruption response, online optimization, Monte Carlo tree search, resource allocation, real-time decision-making},
project_tags = {transit, emergency, POMDP, middleware}
}
Public bus transit systems provide critical transportation services for large sections of modern communities. On-time performance and maintaining the reliable quality of service is therefore very important. Unfortunately, disruptions caused by overcrowding, vehicular failures, and road accidents often lead to service performance degradation. Though transit agencies keep a limited number of vehicles in reserve and dispatch them to relieve the affected routes during disruptions, the procedure is often ad-hoc and has to rely on human experience and intuition to allocate resources (vehicles) to affected trips under uncertainty. In this paper, we describe a principled approach using non-myopic sequential decision procedures to solve the problem and decide (a) if it is advantageous to anticipate problems and proactively station transit buses near areas with high-likelihood of disruptions and (b) decide if and which vehicle to dispatch to a particular problem. Our approach was developed in partnership with the Metropolitan Transportation Authority for a mid-sized city in the USA and models the system as a semi-Markov decision problem (solved as a Monte-Carlo tree search procedure) and shows that it is possible to obtain an answer to these two coupled decision problems in a way that maximizes the overall reward (number of people served). We sample many possible futures from generative models, each is assigned to a tree and processed using root parallelization. We validate our approach using 3 years of data from our partner agency. Our experiments show that the proposed framework serves 2% more passengers while reducing deadhead miles by 40%.
@inproceedings{baiting2024AAMAS,
author = {Luo, Baiting and Zhang, Yunuo and Dubey, Abhishek and Mukhopadhyay, Ayan},
booktitle = {Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems},
title = {Act as You Learn: Adaptive Decision-Making in Non-Stationary Markov Decision Processes},
year = {2024},
address = {Richland, SC},
pages = {1301–1309},
acceptance = {20},
publisher = {International Foundation for Autonomous Agents and Multiagent Systems},
series = {AAMAS '24},
contribution = {colab},
isbn = {9798400704864},
keywords = {non-stationary environments, adaptive learning, decision-making under uncertainty, Monte Carlo tree search, policy learning, risk-aware planning, dynamic systems},
location = {Auckland, New Zealand},
numpages = {9},
what = {This paper addresses adaptive decision-making in non-stationary Markov decision processes where the environment changes over time and the agent's learned policy may become outdated. The researchers develop an approach that combines offline learning using stored policy values with online Monte Carlo tree search to handle environments where both the dynamics and reward structures can shift. The method employs a dual-phase adaptive sampling strategy that balances exploration of unfamiliar regions with exploiting promising actions based on both the previous policy and current environment estimates.},
why = {Most decision-making algorithms assume either completely known environments or stationary dynamics, neither of which holds in real-world systems like emergency response where conditions change unpredictably. This work is innovative because it explicitly addresses the challenge of maintaining safety and performance as the environment evolves. By combining risk-averse tree search with Bayesian uncertainty quantification, the approach enables agents to learn quickly from new data while avoiding pessimistic planning that would sacrifice performance.},
results = {The proposed approach demonstrates superior adaptation compared to standard Monte Carlo tree search and other baselines across multiple environments including control and navigation tasks. The method successfully learns updated policies for new environments while maintaining robustness to changing dynamics. Experiments on standard benchmarks show that the risk-aware sampling strategy enables faster convergence and better performance than approaches that treat environment changes monolithically, proving the value of explicitly modeling uncertainty.},
project_tags = {POMDP, scalable AI, middleware}
}
A fundamental challenge in sequential decision-making is dealing with non-stationary environments, where exogenous environmental conditions change over time. Such problems are traditionally modeled as non-stationary Markov decision processes (NS-MDP). However, existing approaches for decision-making in NS-MDPs have two major shortcomings: first, they assume that the updated environmental dynamics at the current time are known (although future dynamics can change); and second, planning is largely pessimistic, i.e., the agent acts "safely” to account for the non-stationary evolution of the environment. We argue that both these assumptions are invalid in practice-updated environmental conditions are rarely known, and as the agent interacts with the environment, it can learn about the updated dynamics and avoid being pessimistic, at least in states whose dynamics it is confident about. We present a heuristic search algorithm called Adaptive Monte Carlo Tree Search (ADA-MCTS) that addresses these challenges. We show that the agent can learn the updated dynamics of the environment over time and then act as it learns, i.e., if the agent is in a region of the state space about which it has updated knowledge, it can avoid being pessimistic. To quantify "updated knowledge,” we disintegrate the aleatoric and epistemic uncertainty in the agent’s updated belief and show how the agent can use these estimates for decision-making. We compare the proposed approach with multiple state-of-the-art approaches in decision-making across multiple well-established open-source problems and empirically show that our approach is faster and more adaptive without sacrificing safety.
@inproceedings{wilbur2023mobility,
author = {Wilbur, Michael and Coursey, Maxime and Koirala, Pravesh and Al-Quran, Zakariyya and Pugliese, Philip and Dubey, Abhishek},
booktitle = {Proceedings of the ACM/IEEE 14th International Conference on Cyber-Physical Systems (with CPS-IoT Week 2023)},
title = {Mobility-On-Demand Transportation: A System for Microtransit and Paratransit Operations},
year = {2023},
address = {New York, NY, USA},
note = {demonstration},
pages = {260--261},
publisher = {Association for Computing Machinery},
series = {ICCPS '23},
contribution = {lead},
doi = {10.1145/3576841.3589625},
isbn = {9798400700361},
keywords = {mobility-on-demand, software systems, microtransit, paratransit, operational software, deployment, system integration, transportation technology},
location = {San Antonio, TX, USA},
numpages = {2},
url = {https://doi.org/10.1145/3576841.3589625},
what = {This paper presents a comprehensive software system for managing mobility-on-demand services including microtransit and paratransit operations. The SmartTransit.AI system provides web-based interfaces for operational management, mobile applications for drivers and users, and modular optimization components that can accommodate different algorithms and constraints. The paper describes the architecture, implementation challenges, and deployment experiences from real-world testing with transit agencies.},
why = {Despite advances in optimization algorithms, deploying ridesharing systems in practice requires solving numerous challenges beyond pure algorithmic optimization including user interfaces, real-time data integration, and operational constraints. This work is valuable because it demonstrates how research algorithms can be integrated into functional systems that transit agencies can actually deploy. The modular architecture enables different agencies to adopt the system while customizing it to their specific operational needs.},
results = {The SmartTransit.AI system successfully demonstrates the feasibility of deploying advanced optimization algorithms in real transit operations. The integrated software system handles both offline planning and real-time optimization for shared mobility services. Real-world deployment results show the system's ability to improve operational efficiency while maintaining usability for operators and accessibility for passengers.},
project_tags = {transit, middleware}
}
New rideshare and shared-mobility services have transformed urban mobility in recent years. Therefore, transit agencies are looking for ways to adapt to this rapidly changing environment. In this space, ridepooling has the potential to improve efficiency and reduce costs by allowing users to share rides in high-capacity vehicles and vans. Most transit agencies already operate various ridepooling services including microtransit and paratransit. However, the objectives and constraints for implementing these services vary greatly between agencies. This brings multiple challenges. First, off-the-shelf ridepooling formulations must be adapted for real-world conditions and constraints. Second, the lack of modular and reusable software makes it hard to implement and evaluate new ridepooling algorithms and approaches in real-world settings. Therefore, we propose an on-demand transportation scheduling software for microtransit and paratransit services. This software is aimed at transit agencies looking to incorporate state-of-the-art rideshare and ridepooling algorithms in their everyday operations. We provide management software for dispatchers and mobile applications for drivers and users. Lastly, we discuss the challenges in adapting state-of-the-art methods to real-world operations.
@inproceedings{sivagnanam2022offline,
author = {Sivagnanam, Amutheezan and Kadir, Salah Uddin and Mukhopadhyay, Ayan and Pugliese, Philip and Dubey, Abhishek and Samaranayake, Samitha and Laszka, Aron},
booktitle = {31st International Joint Conference on Artificial Intelligence (IJCAI)},
title = {Offline Vehicle Routing Problem with Online Bookings: A Novel Problem Formulation with Applications to Paratransit},
year = {2022},
acceptance = {15},
month = jul,
contribution = {colab},
preprint = {https://arxiv.org/abs/2204.11992},
what = {This work addresses the offline vehicle routing problem with online bookings for paratransit services, where pickup windows are selected at the time of booking rather than predetermined. The authors propose a formulation combining an offline vehicle routing model with an online bookings model, and present computational approaches including an anytime algorithm with reinforcement learning and a Markov decision process formulation.},
why = {Paratransit services for elderly and disabled passengers require high flexibility in response to real-time requests while maintaining operational efficiency. This work is novel because it bridges the gap between offline and online routing problems with practical constraints on pickup windows. The combination of optimization and learning approaches enables the system to adapt to dynamic demand while respecting the transportation agency's operational requirements.},
results = {The proposed methods were evaluated using real-world paratransit data from Chattanooga, showing that the anytime algorithm with learning outperforms baseline approaches. The reinforcement learning approach effectively learns policies that balance responsiveness to immediate requests with long-term efficiency considerations. The experimental results demonstrate significant improvements in cost reduction and robustness when environmental conditions change dynamically.},
keywords = {vehicle routing, online optimization, paratransit services, reinforcement learning, demand-responsive transport},
project_tags = {transit, planning, scalable AI, POMDP}
}
Vehicle routing problems (VRPs) can be divided into two major categories: offline VRPs, which consider a given set of trip requests to be served, and online VRPs, which consider requests as they arrive in real-time. Based on discussions with public transit agencies, we identify a real-world problem that is not addressed by existing formulations: booking trips with flexible pickup windows (e.g., 3 hours) in advance (e.g., the day before) and confirming tight pickup windows (e.g., 30 minutes) at the time of booking. Such a service model is often required in paratransit service settings, where passengers typically book trips for the next day over the phone. To address this gap between offline and online problems, we introduce a novel formulation, the offline vehicle routing problem with online bookings. This problem is very challenging computationally since it faces the complexity of considering large sets of requests—similar to offline VRPs—but must abide by strict constraints on running time—similar to online VRPs. To solve this problem, we propose a novel computational approach, which combines an anytime algorithm with a learning-based policy for real-time decisions. Based on a paratransit dataset obtained from our partner transit agency, we demonstrate that our novel formulation and computational approach lead to significantly better outcomes in this service setting than existing algorithms.
These foundational methods power our use-inspired projects: