Time Series Analysis of Total US Construction Spending on Manufacturing Sector (2002-2020)
Introduction
The functioning of the manufacturing sector is a key indicator of growth for many countries. The workings showcase the level of technological integration that is achieved by a country, with the sector often being the torchbearer for the implementation of automated, cutting-edge technology. A growth in spending on manufacturing not only emphasizes the importance of innovation, but also encourages businesses reliant on the production of goods. Therefore, I have decided to look at how the spending on the construction of manufacturing units in the US has evolved over the past few years.
Data
The source of the dataset is the Federal Reserve Economic Data (FRED) database, which contains information on more than 750,000 economic time series from 96 different sources, maintained by the Research Division of the Federal Reserve Bank of St. Louis. The dataset is linked in Appendix 1. After importing the dataset, which contained data from January 2002 to February 2021, I took the following steps to clean the dataset:
-
Reclassifying columns to fit numerical and datetime formats
-
Removing metadata columns containing qualitative information not relevant to this analysis
The first 5 rows of the finalized dataset are shown in Figure 1.
Figure 1: Finalized Dataset (first 5 rows)
​

Exploration
A plot of the above data can be seen in Figure 2. As visible, there is an upward trend as time passes, with some drops near 2010 and 2016. There is also a clear seasonal pattern that is visible. One particularly noteworthy observation is the sharp drop in spending near the start of 2020. Much of this decrease can be attributed to the COVID-19 pandemic which has brought multiple economies and workplaces to a standstill.
​
Figure 2
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
On first glance, this dataset does not seem stationary, due to the visible seasonality accompanied by the upward trend. To further investigate this claim, additive decomposition was performed, shown in Figure 3, which confirms the presence of trend and seasonality components. To confirm this visible non-stationarity, I performed the Augmented Dicky-Fuller test, the results of which are shown in Figure 4. As the p-value (0.0842) is greater than the significance level (0.05), I failed to reject the null hypothesis, which suggested a presence of unit root and, in turn, indicated that the time series is non-stationary.
​

Figure 3


Figure 4
Transformation
To ensure that the data is stationary, I performed a differencing of order 1 (for trend) and order 12 (for seasonality- since the data is monthly, the period of the data is 12). Figure 5 shows a comparison of the original dataset before and after differencing, and Figure 6 shows the results of the ADF test performed on the differenced dataset. From Figure 5, we can see that the differenced data appears to be centred around 0 (mean is 0) while the variance is relatively constant throughout the timespan (variance does not depend on time.) This result, combined with the p-value from the revised ADF test (<0.01) allows us to reject the null hypothesis of non-stationarity and allows us to say, with 95% confidence, that this differenced dataset is stationary.
Figure 5

Figure 6

Model Selection
To select an appropriate model, I first observed the autocorrelation and partial auto-correlation function, as shown in Figure 7. From this figure, I first tried to identify an appropriate non-seasonal ARIMA model. The ACF and PACF both cut off after the initial lags, suggesting a low-order non-seasonal ARMA model.
Given that the period of the dataset is 12, we now take a look at the behaviour of the ACF and PACF at the seasonal lags. The ACF is significant at the first 2 seasonal lags (lags 12 and 24) after which it is cut off. The PACF meanwhile is significant at the first 4 lags. This suggests a seasonal MA(2) model.
Due to the uncertainty created by the ACF and PACF plots, it becomes important to compare different models and their AICc values.

Figure 7
Figure 8 showcases AICc values for different models. After attempting a fit with several low-order non-seasonal ARIMA models, the model with order (1,1,3) has the lowest AICc. Therefore, the final model for this time series is (1,1,3)x(0,1,2)[12]. Figure 9 shows an ARIMA forecast based on this model.
​
​

Figure 8

Figure 9
Model Diagnostics
The estimated parameters for the chosen (1,1,3)x(0,1,2)[12] model are given in Appendix 2. Taking a look at the model diagnostics, shown in Figure 10, we see that the ACF of the residuals is not significant at any lag. Additionally, the p-values are all above the cut-off interval. These findings suggest an adequate model has been fit to this time series.
Figure 10
​
Additionally, the Q-Q Plot in Figure 11 suggests that the data came from a normal population.
​
Figure 11


Holt-Winters Forecasting
To provide an alternative method for forecasting future values, I also implemented a Holt-Winters Forecast. The forecast, as shown in Figure 12, was performed using the multiplicative version. This is because the time series showcased seasonality that was increasing as time passed. The forecast estimates are provided in Appendix 3.
​
The forecast is reasonable, since it captures the seasonal movements well while also recognizing the upward trend that was present near the end of the data. This forecast, much like the ARIMA forecast, does not account for the sudden drop witnessed near the beginning of 2020 due to the COVID-19 pandemic. Given the unprecedented nature of the event, the forecast is expected to have large errors due to this. Nevertheless, the recovering economy has bolstered productivity and the upward trend in manufacturing spending holds true despite the pandemic.
​
Figure 12

Comparing both forecasts, the seasonality is well captured in both and there is a slightly more upward trend in the Holt-Winters forecast than the ARIMA forecast. Quantitatively, Figure 13 shows important metrics on the performance of both forecasts.
​
Figure 13: Forecast Diagnostics
​
​
​
​
​
After comparing these metrics on the two forecasting methods, it is clear that both forecasting methods have similar accuracy and are comparable.

Conclusion
From the above analysis, I was able to build a seasonal ARIMA model which I used to forecast future values of construction spending on the manufacturing sector. I conducted ADF tests along with differencing techniques to account for trend and seasonality components and then used the ACF and PACF plots along with AICc values to derive an adequate model for this time series. That model was further used for forecasting.
These analyses and forecast show a general upward trend in the spending on the construction of manufacturing sites. The effect of the COVID-19 pandemic has also been taken into account in this dataset, where a sharp drop can be seen in Figure 2 over the 2020 period. A recovery is being made and forecasted by this model however, due to the unexpected and unprecedented nature of this pandemic, the forecast does not account for the sudden drop in activity that has been witnessed worldwide. A future study, using intervention analysis to account for sudden changes in the manufacturing sector due to COVID-19, is a topic I will be pursuing further.
The upward trend in manufacturing spending is a healthy sign for the US economy. The sector powers various businesses and serves as a benchmark for technological innovation. A swift recovery from the pandemic has boosted integration levels and productivity, restabilising manufacturing as a growing and prevalent sector.
Appendix
Appendix 1: Dataset URL
Appendix 2: Parameter Estimates for (1,1,3)x(0,1,2) Model:

Appendix 3: Holt-Winters Forecast Values:
