top of page

Overview:

 

The purpose of this project was to try to determine how the Covid-19 pandemic impacted subway ridership in NYC, and if the vaccination efforts were helping to bring the ridership numbers to pre-pandemic levels. My hypothesis was that the pandemic affected subway ridership in a negative way, however that the vaccination efforts would help bring it to what the MTA would consider normal levels. I gathered vaccination and coronavirus cases data and daily ridership number from the MTA website. I then cleaned the data because the method of data collection varied depending on which agency it was collected from. I then applied linear regression and correlation and plotted those as well. I also created preliminary plots to showcase the data to understand how Vaccinations and Cases played in Daily Ridership from the start of the pandemic. Finally I used the SARIMA model to predict how ridership numbers would change in the next 6 months.

These techniques/methods were accomplished using the following libraries: Matplotlib, SciPy, seaborn, Pandas, and Scikit-Learn.

​

​

You can find the Tableau Dashboard which details how Rental Inventory changed in NYC at the following link: https://github.com/tanveerm176/Covid-Tableau-Viz

Data:

​

This project was completed using 3 datasets:

  1. Daily Coronavirus Cases

  2. Daily Ridership:

    • This dataset is a csv file tracking the daily subway ridership in NYC from the start of the pandemic which was slated to have started on 03/01/2020. It was collected from the MTA Info website at the following URL: https://new.mta.info/coronavirus/ridership

  3. Daily Vaccinations:

​

Exploratory Data Analysis:

 

I first started out by exploring what the data looked like with just raw numbers. As you can see plotting ridership numbers on a day by day basis leads to a graph that is not easy to read. However we can clearly see the impact that Covid-19 had on MTA subway ridership, driving down the numbers to around 500 daily riders, however we seem to be making some sort of recovery. The dips in the plot are due to ridership dropping during the weekends. Average ridership during the weekday hovers around 5 million, and drops to 2 million during the weekend.

fig7.png

Plotting the ridership vs the cases without accounting for the drop in numbers during the weekend shows us a graph that is not intuitive nor easy to look at, however it does provide some insight into how case counts are affecting daily ridership.

fig3.png

I first started by removing the data from the weekend and plotting it, although the graph is a little bit smoother it still has a number of places where ridership drops drastically. In addition, removing data might lead to errors going forward.

fig6.png

Finally I decided to average the daily ridership every 7 days creating a plot that shows the 7 day rolling average. This is not only more intuitive but also allows to account for weekend data that may skew our analysis. I decided to follow the same process for the daily case count, as seen below.

fig5.png
fig4.png

Analysis:

​

These plots show my preliminary findings and is an easy way to understand how the Covid-19 pandemic as well as the vaccination efforts in NYC were impacting daily MTA ridership.

​

As you can see in Figure 1, MTA daily ridership was heavily impacted by the Covid-19 pandemic, driving ridership down to about 500 riders at the lowest. However it seems that as things began to open back up ridership normalized to about less than half of the pre pandemic levels. Interestingly the delta variant which caused a surge of cases in the summer seemed to have little to no affect on ridership and with the accessibility of vaccinations for the general public, ridership is slowly returning to what is was before the pandemic.

​

Figure 2 shows what a graph that simply plots ridership and vaccination numbers over time. Since vaccinations weren't available to the public until 12/14/2020 and ridership data starts at 03/01/2020 the graph incorrectly shows the relationship between vaccine availability and subway ridership.

​

I compensated for the misaligned start dates for each dataset by aligning the subway ridership start date to the start date of the vaccination rates dataset. From this we can see that the ridership has risen from vaccines being available to the public. And even though vaccines have tapered off, ridership is still increasing, whether that be from masking policies allowing businesses to open back up or trying to put the pandemic behind us.
 

Moving the vaccination rates to the correct start date gives us a better idea of how vaccinations and ridership are related, the vaccination spike in June 2022 can be due to the Covid-19 variant that is seen as more dangerous or the general public becoming more comfortable with vaccines.

An overview plot like above doesn't give us much insight. Isolating the start of the graph to the common start date for both the vaccinations as well as ridership datasets gives us a more comprehensive view of the data. From this we can see that the ridership has risen from vaccines being available to the public. And even though vaccines have tapered off, ridership is still increasing, whether that be from masking policies allowing businesses to open back up or trying to put the pandemic behind us.

Correlation Plots and Linear Regression:

​

Correlation between Daily Cases and Daily MTA Ridership:
-0.72

  • There is a strong negative correlation between the case count and subway ridership. This aligns with the plot above as cases increased it drastically impact the number of subway riders in NYC.


Correlation between Daily Vaccinations and Daily Ridership:
0.2

  • A slightly positive correlation exists between the number of vaccines administered and the MTA subway ridership. From the chart above we can see that an increase in vaccinations lead to an increase in the number of people that took the subway. However even after vaccination rates slowed down, ridership still kept increasing. So not a 1:1 correlation but still positive.

​

SARIMA Forecasting

Finally I used the SARIMA (Seasonal Auto-Regressive Integrated Moving Average) model to see if I could predict the future of MTA ridership. Using the 7 day moving average as the train and test sets didn't lead to a model that would accurately predict the data.

Using the estimated counts which included the data for weekend drops in ridership, did lead to a model that performed more accurately. The first graph shows a seasonality parameter of 7 days followed by 14 days, 21 days, and finally 28 days. Visually 21 day seasonality seems to perform the best on the test data and therefore I continued with that parameter for the rest of the forecasting.

I may need to conduct further hyper-parameter tuning to find the seasonality parameter that could lead to better results.

Using the SARIMA model I had fit onto the data, I forecast 6 months of subway ridership, along with the confidence interval of the forecast. I also evaluated model performance using Mean Absolute Error and Mean Squared Error which are listed below.

​

Mean Absolute Error: 700,239

Mean Squared Error: 742,344,225,358

Finally I plotted ridership data for the date range that the model was predicting on, as we can see the model seemed to have predicted the drop in ridership that occurred during the December/January months of 2022. Taking a closer look in the second graph the forecast data seems to be behind the actual data by about a week or so. In the last graph I used the same 7 day rolling average to plot how the forecast differed from the actual numbers.

My SARIMA model seems to follow the general trend of ridership decreasing in the Winter of 2022 and increasing as the year goes on. Interestingly the model is seen to predict the drastic decrease in ridership during the holiday months, although not to the degree of the actual numbers.

Key Takeaways:

 

From this project we can clearly see how Covid-19 has impacted the NYC subway ridership numbers, bringing ridership levels to a historic low, and although ridership seems to be increasing slowly it doesn't seem that the availability of vaccines is making that much of a difference in bringing riders back to pre-pandemic levels. We also need to be cautious about new variants and traveling during the holiday season which drastically decreases the number of people that take the subway in NYC; a main source of income for the city and it's small businesses. Using a SARIMA model I was able to predict future ridership numbers however more analysis is needed to create a model that will take into account new variants and changes to how the public perceives the pandemic.

bottom of page