About ridevibe.io

Hello everyone and welcome to Ride Vibe.

My name is Culton, and I'm an NYC based data scientist. On this page you can read a little about how the app works and how to use it.

The tool is a work in progress so I would love to hear what you think about it, and any suggestions you have for how to improve. Please visit the contact page if you have questions or would like to collaborate or get in touch!

Key features

Ride Vibe has two primary pages. First is the main page, which provides an overview of the whole system, including today's expected volume and percentage comparisons to last month. Also shown is a map of the system, indicating where in the city people are currently entering over the next two days (sort of like a weather map). The main page also has a search bar, which lets you search for individual station pages.

Once you search for your station (see 14 St-Union Sq), the first thing you will see is an isotype indicator with little gray () and red () people. This provides a quick estimate of the crowd level in the station based on the current expected volume, and the number of different trains stopping at the station.

If interested, the table below shows the crowd thresholds used in calculating the indicator:

Isotype Indicator Threshold
0-1 riders/minute/line
1-3
4-8
9-14
15+

The next two plots are about showing the crowd level relative to the predictions for the rest of today, and for the rest of the week. The little white dot indictes the current hour for both charts.

About the Model

The charts on both pages are powered by a time series forecasting model that makes new predictions each day based on the latest available turnstile information and weather predictions. This model was trained on datasets released and maintained by the MTA Open Data Program, in particular the hourly ridership dataset.

Using this data, I developed a forecasting model which can predict hourly ridership for each station up to two weeks in advance. The predictions are based on station attributes, seasonality, weather, and ridership trends, and next up will be to add service changes and large events (e.g. the isn't running in Brooklyn, or there's a concert on near Forest Hills-71 Av).

It is worth mentioning that while the model is based on real ridership numbers, the forecasts shown here are an estimate of crowd sizes, and not real values. I've highlighted this in two ways within the project:

First, because the ridership forecasts are a result of a statistical modeling process, they produce fractional values (e.g. 251.03 riders per hour). As with the MTA's origin-destination estimates, I've left these values visible in the data to highlight the uncertainty inherent in the predictions.

Second, I've tried to be transparent as possible with the performance and error of the model. Below, I've provided some summary metrics about the model currently in use, including its training date and offline test performance. I am also working on an accuracy reporting panel which will provide detail into live prediction errors.

Model Name
Training Date
Test Set Start
Test Set End
Test Error riders/station/hour

Known Bugs

Acknowledgements & Attributions