Graph neural networks, normalizing flows, and deep learning methods that make sense of thousands of sensor streams across cities.
The planning and decision-making research described across our other areas depends on a critical upstream capability: accurate, timely predictions about the state of the world. A transit dispatch planner needs demand forecasts to position vehicles proactively. An emergency response system needs incident predictions to pre-stage responders. An energy management system needs load forecasts to schedule EV charging. And all of these systems need to know when something has gone wrong — when a traffic incident has occurred, when a power grid anomaly signals incipient instability, when sensor readings have drifted from reality.
These prediction and detection problems share common structure. The data is inherently networked: traffic flow at one intersection depends on its neighbors, power consumption at one node affects the grid, and transit delays cascade through connected routes. It is non-stationary: Monday morning rush hour looks nothing like Saturday afternoon. It is noisy: sensors fail, readings are missing, and ground truth labels are scarce. And it must be processed in real time — a traffic incident detected five minutes late is five minutes of worsened congestion and delayed emergency response.
Our research develops ML methods purpose-built for these challenges, with two complementary goals: detecting anomalies and faults in real time, and forecasting the quantities that planning algorithms need to make good decisions.
Anomaly detection in CPS is fundamentally a density estimation problem: if you can model what “normal” looks like, then observations in low-density regions of the distribution are anomalies. But modeling normality in complex infrastructure systems is hard. Power grid synchrophasors produce high-frequency multivariate time series where normal behavior involves subtle correlations across dozens of sensors. Water treatment systems exhibit slow operational trends punctuated by fast localized disruptions. And critically, training data in real-world CPS is rarely clean — it contains unlogged maintenance events, rare operating modes, and other anomalies that contaminate any unsupervised model that assumes anomaly-free training sets.
WENFlow (ICCPS 2026) addresses these challenges through a combination of wavelet transforms and normalizing flows. The discrete wavelet transform decomposes each sensor’s time series into multi-scale components — separating slow trends from fast transients — and a gated selective self-attention mechanism identifies which sensors are most informative at each moment, scaling linearly rather than quadratically with the number of system features. The normalizing flow then estimates the conditional density of the decomposed features, providing principled likelihood-based anomaly scores that adapt to context.
Two properties make WENFlow particularly suited to real CPS deployment. First, it is robust to contaminated training data — unlike methods that assume clean training sets, WENFlow’s density estimation naturally handles the presence of anomalies in historical data, which is the realistic operating condition for any infrastructure system. Second, the wavelet decomposition provides interpretable analysis: operators can see which temporal scales (slow drift vs. fast spike) and which sensors contributed to an anomaly score, moving beyond a binary “anomaly / not anomaly” toward actionable diagnostic information.
This work builds on our earlier wavelet-based synchrophasor monitoring (SMARTCOMP 2023), which combined discrete wavelet transforms with convolutional autoencoders to detect incipient power grid instability from phasor measurement unit data — identifying precursors 5–10 minutes before classical protection systems trigger.
Detecting that something is wrong is only half the problem. The harder question for operators is where: when a sensor detects congestion, the actual incident may be miles upstream, with the observed effects having propagated through the road network. Telling a dispatcher “there’s congestion somewhere on I-40” is far less useful than “the likely cause is a stalled vehicle at mile marker 42.”
TRACE (Traffic Response Anomaly Capture Engine) combines graph neural networks, transformers, and probabilistic normalizing flows to detect and localize traffic incidents in real time. The graph structure captures the road network topology — how disturbances propagate from one segment to its neighbors — while the transformer captures temporal dynamics and the normalizing flow provides uncertainty-calibrated anomaly scores. Deployed with the Tennessee Department of Transportation across over 1,000 sensors statewide, TRACE improved incident localization by 0.6 miles (17%) over prior methods while maintaining detection accuracy.
The localization capability is what makes TRACE operationally valuable. By modeling the spatiotemporal propagation of traffic disturbances across the road network graph, the system traces observed downstream effects back to their likely origin point — giving first responders a specific location to investigate rather than a general area of congestion.
Our work on traffic anomaly detection spans several years and approaches: from Pythagorean mean-based invariants for weakly unsupervised detection across metropolitan areas, to large-scale incident detection in smart transportation systems using incremental region-growing algorithms, to decentralized real-time anomaly detection for transportation networks. Each contribution addressed a different facet of the detection problem — scalability, localization precision, decentralized operation — and the insights from this progression are reflected in TRACE’s integrated architecture.
Anomaly detection tells you when something has gone wrong. Forecasting tells planning algorithms what is likely to happen next — and the quality of the forecast directly determines the quality of the plan.
Our forecasting work spans multiple CPS domains. For transit demand, we developed graph neural network models that predict bus ridership by incorporating the spatial relationships between routes and stops — capturing how demand at one stop correlates with nearby stops and connecting services. The graph structure is essential because transit demand is not independent across the network: a surge at a transfer hub propagates to every route it serves. For energy consumption, our forecasting work for mixed-vehicle fleets predicts charging demand across heterogeneous vehicle types — electric buses, paratransit vans, school buses — accounting for the different usage patterns, battery characteristics, and scheduling constraints of each vehicle class.
A particularly important forecasting contribution is MoveOD (ICCPS 2026), which addresses a data availability problem that has long constrained transportation planning. High-resolution origin-destination (OD) data — where people travel from and to — is critical for simulation, planning, and evaluation, but it is expensive to collect and rarely available outside a few data-rich metropolitan areas. MoveOD synthesizes fine-grained commuter OD flows for any U.S. county by fusing publicly available datasets: Census travel-time distributions, LODES workplace flows, OpenStreetMap road networks, and building footprints. The result is a practical tool for creating transportation digital twins anywhere in the country — enabling communities that lack expensive travel surveys to still benefit from data-driven planning.
A recurring theme across our anomaly detection and forecasting work is that “normal” is context-dependent. Traffic flow that would be anomalous at 2 PM on a sunny Tuesday is perfectly expected during a rainy Friday rush hour. Fixed thresholds produce unacceptable false positive rates; operators learn to ignore the alerts, and the system loses trust.
Our use of conditional normalizing flows — generative models that condition their density estimates on contextual variables like time of day, weather, recent history, and event calendars — addresses this directly. The same architectural choice appears across WENFlow (conditioning on temporal scale and sensor context), TRACE (conditioning on network topology and traffic patterns), and our conditional flow-based traffic anomaly scoring. By adapting the definition of normality to the current context, these methods dramatically reduce false positives while maintaining sensitivity to true anomalies.
Looking forward, the connection between prediction and planning is becoming tighter. The forecasting models described here feed directly into the online planners that make dispatch and scheduling decisions. The anomaly detection methods alert the non-stationarity detection mechanisms that trigger model updates and replanning. And the interpretability of wavelet-based decomposition and attention-based feature selection connects to our explainability research — operators need to understand not just what the prediction is, but why the model is confident or uncertain. These connections are not incidental; they reflect the design of a research program where prediction, planning, and explanation form a coherent pipeline for AI-driven CPS.
Selected Publications:
@inproceedings{iccps2026_wenflow,
author = {Buckelew, Jacob and Talusan, Jose Paolo and Sivaramakrishnan, Vasavi and Mukhopadhyay, Ayan and Srivastava, Anurag and Dubey, Abhishek},
title = {WENFlow: Wavelet-Enhanced Normalizing Flows for Real-Time Anomaly Detection in CPS},
year = {2026},
booktitle = {Proceedings of the HSCC/ICCPS 2026: 29th ACM International Conference on Hybrid Systems: Computation and Control and 17th ACM/IEEE International Conference on Cyber-Physical Systems},
location = {Saint Malo, France},
keywords = {anomaly detection, cyber-physical systems, wavelet transforms, normalizing flows, spatiotemporal analysis, unsupervised learning, interpretability},
note = {Acceptance rate: 28\%; Regular Paper; Track: Foundations},
series = {HSCC/ICCPS '26},
what = {WENFlow proposes a wavelet-enabled normalizing flow framework for unsupervised anomaly detection in high-dimensional cyber-physical systems. The work addresses the challenge of detecting subtle anomalies in systems like power grids and water networks that exhibit complex spatiotemporal patterns. WENFlow combines discrete wavelet transform for multi-scale temporal feature extraction with gated selective self-attention to identify critical sensors, conditional density estimation for likelihood-based anomaly scoring, and interpretable analysis through log-density and feature importance.},
why = {Real-time anomaly detection in complex infrastructure systems requires capturing both slow operational trends and fast localized disruptions, with scalable robustness to contaminated training data and high dimensionality. Existing methods struggle with spatiotemporal dependencies and contamination from unlogged maintenance events. WENFlow is innovative because it achieves linear complexity scaling with sensor dimensionality through wavelet decomposition and feature-wise attention, providing both accurate anomaly detection and interpretable explanations of which sensors and temporal patterns indicate anomalies.},
results = {Extensive evaluation on power grid and water treatment benchmarks demonstrates WENFlow achieves superior anomaly detection performance compared to state-of-the-art methods including transformers and density-based approaches, while maintaining linear scaling with system dimensionality and robustness to contaminated training data. The framework provides interpretable analysis through feature importance scores and temporal pattern visualization.},
project_tags = {CPS, ML for CPS, Explainable AI}
}
Real-time anomaly detection in high-dimensional data is crucial for ensuring the security of cyber-physical systems (CPS) such as power grids and water distribution networks. Such data commonly take the form of multivariate time series, often unlabeled and necessitating the need for unsupervised detection methods. However, many unsupervised deep learning methods make assumptions about the normality of training data, which is unrealistic in real-world CPS where training data often contain anomalies or rare patterns. Furthermore, these methods rely on inefficient mechanisms to learn spatiotemporal dependencies in the data and scale quadratically with the number of system features. To address these problems, we propose Wavelet-Enhanced Normalizing Flows (WENFlow), an unsupervised deep learning model that identifies anomalies in low-density regions of the data distribution and does not assume access to anomaly-free training data. Notably, WENFlow leverages a scalable Gated Selective Self-Attention mechanism for capturing the most critical spatial dependencies between features. Compared to existing models, WENFlow scales linearly with respect to the number of system features and meets real-time inference requirements for anomaly detection. In our experiments, WENFlow achieves superior AUC scores against baseline methods across datasets with varying anomaly ratios, showcasing its robustness against contaminated training data. We evaluate WENFlow on 2 real-world benchmark datasets and a simulated phasor measurement unit dataset collected from a power grid testbed.
@inproceedings{zulqarnain2025,
author = {Zulqarnain, Ammar and Buckelew, Jacob and Talusan, Jose Paolo and Mukhopadhyay, Ayan and Dubey, Abhishek},
booktitle = {2025 IEEE International Conference on Smart Computing (SMARTCOMP)},
title = {TRACE: Traffic Response Anomaly Capture Engine for Localization of Traffic Incidents},
year = {2025},
month = jun,
contribution = {lead},
what = {TRACE is a novel framework for real-time traffic anomaly detection and localization that combines Graph Neural Networks, Transformers, and normalizing flows. The system learns the spatial-temporal dependencies in road networks through graph convolutions while capturing long-range temporal interactions through transformer attention. To detect anomalies, TRACE computes log-likelihoods under a learned probability distribution, identifying points where traffic patterns deviate significantly from normal conditions. The framework provides both anomaly detection and localization through density-based analysis.},
why = {Traditional traffic anomaly detection methods struggle with the complexity of capturing spatial-temporal dependencies in interconnected road networks while maintaining scalability and interpretability. TRACE is innovative because it unifies multiple deep learning paradigms (graph neural networks, transformers, normalizing flows) within a probabilistic framework, enabling unsupervised anomaly detection without requiring labeled anomaly data. The density-based approach provides interpretable anomaly scores grounded in learned probability models.},
results = {Evaluation on real-world traffic data from a mid-sized US metropolitan area demonstrates that TRACE significantly improves incident localization precision by 17% compared to methods that identify anomalies without spatial localization. The framework achieves superior detection latency and mean localization error compared to state-of-the-art baselines.},
keywords = {traffic anomaly detection, graph neural networks, transformers, probabilistic modeling, spatial-temporal analysis, smart transportation, anomaly localization},
project_tags = {CPS, ML for CPS, transit}
}
Effective traffic incident management is critical for road safety and operational efficiency. Yet, many transportation agencies rely on reactionary methods, where incidents are reported by human agents and managed through rule- based frameworks like traditional Traffic Incident Management (TIM) systems. However, these are vulnerable to human error, oversight, and delays during high-stress conditions. Although recent initiatives incorporating real-time sensor data for cor- ridor monitoring and enhanced roadway information systems represent strides toward modernization, these systems often still require substantial human intervention. Recent advancements in graph-based deep learning models offer promising potential for addressing the limitations of traditional methods. While state- of-the-art models exist, the complexities of incident localization within dynamic and interconnected road networks, along with limited availability of high-quality labeled data and variability in real-time traffic measurements, are still open challenges. To address these, we propose the Traffic Response Anomaly Capture Engine (TRACE), a novel approach that combines graph neural networks, transformers, and probabilistic normalizing flows to accurately detect and localize traffic anomalies in real time. TRACE captures spatial-temporal dependencies, manages data uncertainty, and enhances automation, supporting more precise and timely incident localization. Our approach is validated on real-world traffic data and improved incident localization by 0.6 miles (17%) than SOTA methods while maintaining similar incident detection accuracy and mean detection delay.
@inproceedings{Buckelew2023,
author = {Buckelew, Jacob and Basumallik, Sagnik and Sivaramakrishnan, Vasavi and Mukhopadhyay, Ayan and Srivastava, Anurag K. and Dubey, Abhishek},
booktitle = {2023 IEEE International Conference on Smart Computing (SMARTCOMP)},
title = {Synchrophasor Data Event Detection using Unsupervised Wavelet Convolutional Autoencoders},
year = {2023},
acceptance = {31},
pages = {326-331},
contribution = {lead},
doi = {10.1109/SMARTCOMP58114.2023.00080},
keywords = {power system monitoring, anomaly detection, wavelet analysis, autoencoders, unsupervised learning, phasor measurement units, grid events, real-time detection},
what = {This paper presents an unsupervised machine learning approach for detecting anomalies in power transmission systems using wavelet-based feature extraction combined with convolutional autoencoders. The method processes phasor measurement unit data using discrete wavelet transforms to extract time-frequency features, which are then fed into an autoencoder for anomaly detection. The approach is validated on hardware-in-the-loop simulations and real IEEE 14-bus system data, achieving high detection accuracy without requiring labeled training data.},
why = {Reliable detection of grid events and anomalies is critical for maintaining power system stability and preventing cascading failures. Existing supervised approaches require extensive labeled datasets that are difficult to obtain in practice. This work is important because it demonstrates how unsupervised learning can automatically identify important features of grid events through wavelet analysis, enabling detection of diverse anomalies without labeled examples. The approach is practical for real-time grid monitoring applications.},
results = {The wavelet-convolutional autoencoder framework achieves 97.7% accuracy, 98% precision, and 99.5% recall on power system event detection tasks, substantially outperforming baseline approaches. The method successfully detects various types of grid events including faults and disturbances with minimal false positives. The unsupervised approach significantly reduces the burden of obtaining labeled training data, making it practical for deployment in operational grid monitoring systems.},
project_tags = {energy, CPS, ML for CPS}
}
Timely and accurate detection of events affecting the stability and reliability of power transmission systems is crucial for safe grid operation. This paper presents an efficient unsupervised machine-learning algorithm for event detection using a combination of discrete wavelet transform (DWT) and convolutional autoencoders (CAE) with synchrophasor phasor measurements. These measurements are collected from a hardware-in-the-loop testbed setup equipped with a digital real-time simulator. Using DWT, the detail coefficients of measurements are obtained. Next, the decomposed data is then fed into the CAE that captures the underlying structure of the transformed data. Anomalies are identified when significant errors are detected between input samples and their reconstructed outputs. We demonstrate our approach on the IEEE-14 bus system considering different events such as generator faults, line-to-line faults, line-to-ground faults, load shedding, and line outages simulated on a real-time digital simulator (RTDS). The proposed implementation achieves a classification accuracy of 97.7%, precision of 98.0%, recall of 99.5%, F1 Score of 98.7%, and proves to be efficient in both time and space requirements compared to baseline approaches.
@techreport{barbour2024tdot,
author = {Barbour, William and Baroud, Hiba and Dubey, Abhishek and Sprinkle, Jonathan and Work, Daniel},
title = {TDOT RDS Data Quality Assurance and High-Resolution Content Enhancement},
year = {2024},
url = {https://trid.trb.org/View/2499199}
}
@article{tcpsislam24,
author = {Islam, Md. Jaminur and Talusan, Jose Paolo and Bhattacharjee, Shameek and Tiausas, Francis and Dubey, Abhishek and Yasumoto, Keiichi and Das, Sajal K.},
journal = {ACM Trans. Cyber-Phys. Syst.},
title = {Scalable Pythagorean Mean-based Incident Detection in Smart Transportation Systems},
year = {2024},
issn = {2378-962X},
month = may,
number = {2},
volume = {8},
address = {New York, NY, USA},
articleno = {20},
contribution = {colab},
doi = {10.1145/3603381},
issue_date = {April 2024},
keywords = {Weakly unsupervised learning, anomaly detection, smart transportation, graph algorithms, cluster analysis, regression, incident detection, approximation algorithm},
numpages = {25},
publisher = {Association for Computing Machinery},
url = {https://doi.org/10.1145/3603381}
}
Modern smart cities need smart transportation solutions to quickly detect various traffic emergencies and incidents in the city to avoid cascading traffic disruptions. To materialize this, roadside units and ambient transportation sensors are being deployed to collect speed data that enables the monitoring of traffic conditions on each road segment. In this article, we first propose a scalable data-driven anomaly-based traffic incident detection framework for a city-scale smart transportation system. Specifically, we propose an incremental region growing approximation algorithm for optimal Spatio-temporal clustering of road segments and their data; such that road segments are strategically divided into highly correlated clusters. The highly correlated clusters enable identifying a Pythagorean Mean-based invariant as an anomaly detection metric that is highly stable under no incidents but shows a deviation in the presence of incidents. We learn the bounds of the invariants in a robust manner such that anomaly detection can generalize to unseen events, even when learning from real noisy data. Second, using cluster-level detection, we propose a folded Gaussian classifier to pinpoint the particular segment in a cluster where the incident happened in an automated manner. We perform extensive experimental validation using mobility data collected from four cities in Tennessee and compare with the state-of-the-art ML methods to prove that our method can detect incidents within each cluster in real-time and outperforms known ML methods.
@inproceedings{jp2022,
author = {Islam, Jaminur and Talusan, Jose Paolo and Bhattacharjee, Shameek and Tiausas, Francis and Vazirizade, Sayyed Mohsen and Dubey, Abhishek and Yasumoto, Keiichi and Das, Sajal},
booktitle = {ACM/IEEE 13th International Conference on Cyber-Physical Systems (ICCPS)},
title = {Anomaly based Incident Detection in Large Scale Smart Transportation Systems},
year = {2022},
month = apr,
publisher = {IEEE},
note = {Nominated for Best Paper Award},
acceptance = {30},
contribution = {lead},
what = {This paper presents a comprehensive tool-chain for anomaly detection in large-scale smart transportation systems using region growing approximation algorithms. The approach combines data-driven learning with spatial structure exploitation to identify traffic incidents across interconnected road segments while maintaining computational tractability. The framework uses harmonic mean and arithmetic mean metrics to detect deviations in transportation patterns.},
why = {Real-time incident detection in large transportation networks is challenging due to complex spatiotemporal dependencies and high data volumes. This work is significant because it proposes a theoretically grounded approach that guarantees invariance properties necessary for robust anomaly detection. The region growing algorithm addresses scalability challenges while maintaining accuracy in detecting true incidents.},
results = {The experimental evaluation using real traffic data from Nashville, Tennessee demonstrated that the proposed framework successfully detects incidents in real-time with high accuracy. The method's invariance under benign conditions ensures low false alarm rates while remaining sensitive to true incidents. The region growing approximation achieved computationally tractable solutions for large-scale networks without sacrificing detection performance.},
keywords = {anomaly detection, smart transportation, incident detection, graph algorithms, traffic monitoring},
project_tags = {transit, CPS, ML for CPS}
}
Modern smart cities are focusing on smart transportation solutions to detect and mitigate the effects of various traffic incidents in the city. To materialize this, roadside units and ambient transportation sensors are being deployed to collect vehicular data that provides real-time traffic monitoring. In this paper, we first propose a real-time data-driven anomaly-based traffic incident detection framework for a city-scale smart transportation system. Specifically, we propose an incremental region growing approximation algorithm for optimal Spatio-temporal clustering of road segments and their data; such that road segments are strategically divided into highly correlated clusters. The highly correlated clusters enable identifying a Pythagorean Mean-based invariant as an anomaly detection metric that is highly stable under no incidents but shows a deviation in the presence of incidents. We learn the bounds of the invariants in a robust manner such that anomaly detection can generalize to unseen events, even when learning from real noisy data. We perform extensive experimental validation using mobility data collected from the City of Nashville, Tennessee, and prove that the method can detect incidents within each cluster in real-time.
@inproceedings{iccps2026_moveod,
author = {Sen, Rishav and Talusan, Jose Paolo and Dubey, Abhishek and Mukhopadhyay, Ayan and Samaranayake, Samitha and Laszka, Aron},
title = {MoveOD: Synthesizing Fine-Grained Origin--Destination Data for Transportation CPS},
year = {2026},
booktitle = {Proceedings of the HSCC/ICCPS 2026: 29th ACM International Conference on Hybrid Systems: Computation and Control and 17th ACM/IEEE International Conference on Cyber-Physical Systems},
location = {Saint Malo, France},
keywords = {origin-destination synthesis, travel demand, transportation planning, data fusion, Bayesian methods, public datasets, traffic simulation},
note = {Acceptance rate: 28\%; Short Paper; Track: Systems and Applications},
series = {HSCC/ICCPS '26},
what = {MoveOD presents a framework for synthesizing fine-grained origin-destination commute patterns from publicly available datasets by integrating census data, employment records, and road networks. The approach uses Bayesian decomposition to generate minute-level commute trip distributions while preserving spatial and temporal coherence with observed commuting patterns. The framework leverages public data sources including US Census Community Survey, Longitudinal Employer-Household Dynamics, and OpenStreetMap to generate realistic synthetic commute data.},
why = {High-resolution origin-destination data is essential for transportation planning and traffic management, yet collecting such data through surveys or GPS tracking is expensive and privacy-invasive. Existing synthetic approaches fail to capture temporal and spatial granularity needed for realistic simulation. MoveOD is innovative because it demonstrates how publicly available marginal data can be combined through principled statistical methods to generate detailed, temporally-resolved commute patterns that preserve observed macro-level statistics while enabling microscopic simulations.},
results = {Validation on Hamilton County, Tennessee data demonstrates that the calibrated MoveOD approach accurately reproduces observed census commute patterns while generating realistic minute-level departure time distributions. The framework achieves alignment with ACS travel time margins through careful calibration, enabling fast synthetic data generation suitable for any US county and providing a reusable tool for transportation research.},
project_tags = {transit, planning}
}
High-resolution origin–destination (OD) tables are critical to cyber-physical transportation systems, enabling realistic digital twins, adaptive routing strategies, signal timing optimization, and demand-responsive mobility services. However, such OD data is rarely available outside a small number of data-rich metropolitan regions. We introduce MoveOD, an open-source pipeline that synthesizes publicly available datasets to generate fine-grained commuter OD flows with spatial and temporal departure distributions for any U.S. county. MoveOD fuses American Community Survey travel-time and departure distributions, Longitudinal Employer–Household Dynamics (LODES) residence–workplace flows, OpenStreetMap (OSM) road networks, and building footprint data. Our approach ensures consistency with observed commuter totals, workplace employment distributions, and reported travel durations. MoveOD is integrated with a transportation digital twin, enabling end-to-end CPS experimentation. We demonstrate the system in Hamilton County, Tennessee, generating approximately 150,000 synthetic daily trips and evaluating routing algorithms in a live dashboard.
@inproceedings{samir2024smartcomp,
author = {Gupta, Samir and Khanna, Agrima and Talusan, Jose Paolo and Said, Anwar and Freudberg, Dan and Mukhopadhyay, Ayan and Dubey, Abhishek},
booktitle = {2024 IEEE International Conference on Smart Computing (SMARTCOMP)},
title = {A Graph Neural Network Framework for Imbalanced Bus Ridership Forecasting},
year = {2024},
acceptance = {32.9},
month = jun,
contribution = {lead},
what = {This paper proposes a Graph Convolutional Network framework for bus ridership forecasting that addresses data sparsity and imbalance issues in public transit occupancy prediction. The approach combines graph neural networks to capture spatial-temporal dependencies with data augmentation and focal loss to handle the heavy-tail occupancy distribution. GCNs model bus networks as graphs where stops and routes capture the transit network structure, enabling the model to learn patterns specific to route dynamics.},
why = {Public transit systems require accurate occupancy forecasting for operational planning, but many routes exhibit sparse data with imbalanced occupancy distributions (most trips have low occupancy, few have high occupancy). GCN-based methods are innovative because they leverage the underlying graph structure of transit networks to learn more expressive representations while handling data sparsity through inductive learning across stops and routes, improving generalization.},
results = {Evaluation on real WEGo Public Transit data from Nashville demonstrates that the GCN approach significantly outperforms traditional baselines including random forest and XGBoost methods, with particular improvements in predicting high-occupancy events that are critical for preventing overcrowding and ensuring service quality.},
keywords = {ridership forecasting, graph neural networks, public transit, occupancy prediction, data imbalance, spatio-temporal modeling},
project_tags = {transit, ML for CPS}
}
Public transit systems are paramount in lowering carbon emissions and reducing urban congestion for environmental sustainability. However, overcrowding has adverse effects on the quality of service, passenger experience, and overall efficiency of public transit causing a decline in the usage of public transit systems. Therefore, it is crucial to identify and forecast potential windows of overcrowding to improve passenger experience and encourage higher ridership. Predicting ridership is a complex task, due to the inherent noise of collected data and the sparsity of overcrowding events. Existing studies in predicting public transit ridership consider only a static depiction of bus networks. We address these issues by first applying a data processing pipeline that cleans noisy data and engineers several features for training. Then, we address sparsity by converting the network to a dynamic graph and using a graph convolutional network, incorporating temporal, spatial, and auto-regressive features, to learn generalizable patterns for each route. Finally, since conventional loss functions like categorical cross-entropy have limitations in addressing class imbalance inherent in ridership data, our proposed approach uses focal loss to refine the prediction focus on less frequent yet task-critical overcrowding instances. Our experiments, using real-world data from our partner agency, show that the proposed approach outperforms existing state-of-the-art baselines in terms of accuracy and robustness.
@misc{dubey2025forecasting,
author = {Dubey, Abhishek and Wilbur, Michael and Mukhopadhyay, Ayan and Laszka, Aron},
month = jan,
title = {Forecasting energy consumption in a mixed-vehicle fleet},
year = {2025},
journal = {US Patent App. 18/708,438},
url = {https://patents.google.com/patent/US20250030766A1/en}
}
@inproceedings{Wilbur2019,
author = {Wilbur, Michael and Dubey, Abhishek and Le{\~{a}}o, Bruno and Bhattacharjee, Shameek},
booktitle = {{IEEE} International Conference on Smart Computing, {SMARTCOMP} 2019, Washington, DC, USA},
title = {A Decentralized Approach for Real Time Anomaly Detection in Transportation Networks},
year = {2019},
month = jun,
acceptance = {29},
pages = {274--282},
bibsource = {dblp computer science bibliography, https://dblp.org},
biburl = {https://dblp.org/rec/bib/conf/smartcomp/WilburDLB19},
category = {selectiveconference},
contribution = {lead},
doi = {10.1109/SMARTCOMP.2019.00063},
file = {:Wilbur2019-A_Decentralized_Approach_for_Real_Time_Anomaly_Detection_in_Transportation_Networks.pdf:PDF},
keywords = {anomaly detection, data integrity attacks, road side units, smart transportation, decentralized systems, clustering algorithms},
project = {cps-reliability,smart-transit,smart-cities},
tag = {ai4cps,platform,decentralization,incident,transit},
timestamp = {Wed, 16 Oct 2019 14:14:54 +0200},
url = {https://doi.org/10.1109/SMARTCOMP.2019.00063},
what = {This paper presents a decentralized anomaly detection framework for smart transportation networks that distributes detection across road side units while identifying orchestrated data integrity attacks. The system employs zone-level detection at RSUs combined with sensor-level detection to handle both deductive and camouflage attacks where adversaries manipulate speed readings. The approach uses hierarchical clustering algorithms for RSU placement optimization.},
why = {Data integrity attacks in transportation systems can have serious consequences, and centralized detection approaches create single points of failure. This work is innovative because it distributes anomaly detection to network edge while maintaining ability to identify complex, multi-sensor attacks that try to evade detection. The zone-level and sensor-level detection hierarchy enables efficient resource utilization at constrained edge devices.},
results = {The RSU clustering approach successfully optimized RSU placement to concentrate resources where sensors were dense while minimizing communication overhead. The system demonstrated ability to detect deductive attacks where individual sensor readings are altered and camouflage attacks where multiple sensors are manipulated to evade detection. The decentralized approach improved computational efficiency while maintaining detection accuracy.},
project_tags = {transit, emergency, CPS, middleware}
}
These foundational methods power our use-inspired projects: