Why This Matters

Vehicle-to-building energy management presents a high-dimensional, continuous control problem under uncertainty where traditional optimization methods struggle with real-time responsiveness and scalability. This work is innovative because it combines modern deep reinforcement learning with domain-specific constraints and knowledge, enabling scalable learning of near-optimal charging policies that naturally adapt to building dynamics and user behavior without requiring explicit model calibration.

What We Did

This work proposes a reinforcement learning-based approach for vehicle-to-building charging that combines Deep Deterministic Policy Gradient with action masking and policy guidance. The framework models V2B as a Markov decision process with continuous action spaces and constraints, using action masking to ensure feasibility and policy guidance to improve learning efficiency. The approach incorporates domain-specific knowledge about charging physics, building loads, and grid constraints while maintaining flexibility to adapt to new operational scenarios.

Key Results

Evaluation on real EV fleet data from a major manufacturer demonstrates significant cost savings while meeting all user charging requirements and grid constraints. The learned policies achieve substantial improvements in demand charge reduction and total operating costs compared to both heuristic baselines and model-predictive control approaches.

Full Abstract

Cite This Paper

@inproceedings{liu2024reinforcement,
  author = {Liu, Fangqi and Sen, Rishav and Talusan, Jose and Pettet, Ava and Kandel, Aaron and Suzue, Yoshinori and Mukhopadhyay, Ayan and Dubey, Abhishek},
  booktitle = {Proceedings of the 23rd Conference on Autonomous Agents and MultiAgent Systems, {AAMAS} 2025, Detroit, Michigan},
  title = {Reinforcement Learning-based Approach for Vehicle-to-Building Charging with Heterogeneous Agents and Long Term Rewards},
  year = {2025},
  address = {Richland, SC},
  note = {nominated for best paper},
  organization = {International Conference on Autonomous Agents and Multi-Agent Systems},
  publisher = {International Foundation for Autonomous Agents and Multiagent Systems},
  series = {AAMAS '25},
  abstract = {Strategic aggregation of electric vehicle batteries as energy reservoirs can optimize power grid demand, benefiting smart and connected communities, especially large office buildings that offer workplace charging. This involves optimizing charging and discharging to reduce peak energy costs and net peak demand, monitored over extended periods (e.g., a month), which involves making sequential decisions under uncertainty and delayed and sparse rewards, a continuous action space, and the complexity of ensuring generalization across diverse conditions. Existing algorithmic approaches, e.g., heuristic-based strategies, fall short in addressing real-time decision-making under dynamic conditions, and traditional reinforcement learning (RL) models struggle with large stateaction spaces, multi-agent settings, and the need for long-term reward optimization. To address these challenges, we introduce a novel RL framework that combines the Deep Deterministic Policy Gradient approach (DDPG) with action masking and efficient MILP-driven policy guidance. Our approach balances the exploration of continuous action spaces to meet user charging demands. Using real-world data from a major electric vehicle manufacturer, we show that our approach comprehensively outperforms many well-established baselines and several scalable heuristic approaches, achieving significant cost savings while meeting all charging requirements. Our results show that the proposed approach is one of the first scalable and general approaches to solving the V2B energy management challenge.},
  acceptance = {24.5},
  category = {selective},
  contribution = {lead},
  location = {Detroit, Michigan},
  keywords = {electric vehicle charging, reinforcement learning, deep deterministic policy gradient, building energy management, demand response, stochastic control}
}
Quick Info
Year 2025
Series AAMAS '25
Keywords
electric vehicle charging reinforcement learning deep deterministic policy gradient building energy management demand response stochastic control
Research Areas
energy planning ML for CPS
Search Tags

Reinforcement, Learning, Approach, Vehicle, Building, Charging, Heterogeneous, Agents, Long, Term, Rewards, electric vehicle charging, reinforcement learning, deep deterministic policy gradient, building energy management, demand response, stochastic control, energy, planning, ML for CPS, 2025, Liu, Sen, Talusan, Pettet, Kandel, Suzue, Mukhopadhyay, Dubey, AAMAS25