Why This Matters

Accurate occupancy and delay predictions are essential for transit agencies to optimize operations and improve passenger information, but predictions are challenging due to sparse sensor data and the complexity of transit dynamics. This work addresses the practical challenge of developing predictive models despite data quality issues that plague real-world transit systems. The end-to-end framework demonstrates how to process raw sensor data into actionable predictions.

What We Did

This paper presents methods for predicting transit occupancy and delay at both trip and stop levels despite sparse automated passenger counter data. The approach combines data from multiple sources including GTFS schedules, weather data, and historical patterns to develop separate prediction models for different problem formulations. The work demonstrates how to handle data sparsity and noise through careful feature engineering and data aggregation strategies.

Key Results

The prediction models achieve reasonable accuracy for occupancy and delay forecasting on real transit data from Nashville. The approach demonstrates how different aggregation strategies and feature engineering choices affect prediction performance. Results show that treating occupancy and delay as related prediction problems improves accuracy compared to separate approaches, providing transit agencies with tools for operational planning.

Full Abstract

Cite This Paper

@inproceedings{talusan2022apc,
  author = {Talusan, Jose Paolo and Mukhopadhyay, Ayan and Freudberg, Dan and Dubey, Abhishek},
  booktitle = {2022 IEEE International Conference on Big Data (Big Data)},
  title = {On Designing Day Ahead and Same Day Ridership Level Prediction Models for City-Scale Transit Networks Using Noisy APC Data},
  year = {2022},
  address = {Los Alamitos, CA, USA},
  month = {dec},
  pages = {5598-5606},
  publisher = {IEEE Computer Society},
  abstract = {The ability to accurately predict public transit ridership demand benefits passengers and transit agencies. Agencies will be able to reallocate buses to handle under or over-utilized bus routes, improving resource utilization, and passengers will be able to adjust and plan their schedules to avoid overcrowded buses and maintain a certain level of comfort. However, accurately predicting occupancy is a non-trivial task. Various reasons such as heterogeneity, evolving ridership patterns, exogenous events like weather, and other stochastic variables, make the task much more challenging. With the progress of big data, transit authorities now have access to real-time passenger occupancy information for their vehicles. The amount of data generated is staggering. While there is no shortage in data, it must still be cleaned, processed, augmented, and merged before any useful information can be generated. In this paper, we propose the use and fusion of data from multiple sources, cleaned, processed, and merged together, for use in training machine learning models to predict transit ridership. We use data that spans a 2-year period (2020-2022) incorporating transit, weather, traffic, and calendar data. The resulting data, which equates to 17 million observations, is used to train separate models for the trip and stop level prediction. We evaluate our approach on real-world transit data provided by the public transit agency of Nashville, TN. We demonstrate that the trip level model based on Xgboost and the stop level model based on LSTM outperform the baseline statistical model across the entire transit service day.},
  contribution = {lead},
  doi = {10.1109/BigData55660.2022.10020390},
  keywords = {transit prediction, occupancy forecasting, delay prediction, automated passenger counting, machine learning, operational planning, transit optimization, real-time information},
  url = {https://doi.ieeecomputersociety.org/10.1109/BigData55660.2022.10020390},
  month_numeric = {12}
}
Quick Info
Year 2022
Keywords
transit prediction occupancy forecasting delay prediction automated passenger counting machine learning operational planning transit optimization real-time information
Research Areas
transit ML for CPS
Search Tags

Designing, Ahead, Same, Ridership, Level, Prediction, Models, City, Scale, Transit, Networks, Noisy, Data, transit prediction, occupancy forecasting, delay prediction, automated passenger counting, machine learning, operational planning, transit optimization, real-time information, transit, ML for CPS, 2022, Talusan, Mukhopadhyay, Freudberg, Dubey