The accuracy of U.S. Social Security Administration (SSA) demographic and financial forecasts is crucial for the solvency of its Trust Funds, government programs comprising greater than 50% of all federal government expenditures, industry decision making, and the evidence base of many scholarly articles. Forecasts are also essential for scoring policy proposals, put forward by both political parties, in order to quantify their effect on the program and the budget. Because SSA makes public little replication information, and uses ad hoc, qualitative, and antiquated statistical forecasting methods, no one in or out of government has been able to produce fully independent alternative forecasts or policy scorings. Yet, no systematic evaluation of SSA forecasts has ever been published by SSA or anyone else. We show that SSA’s forecasting errors were approximately unbiased until about 2000, but then began to grow quickly, with increasingly overconfident uncertainty intervals. Moreover, the errors all turn out to be in the same potentially dangerous direction, each making the Social Security Trust Funds look healthier than they actually are. We also discover the cause of these findings with evidence from a large number of interviews we conducted with participants at every level of the forecasting and policy processes. We show that SSA’s forecasting procedures meet all the conditions the modern social-psychology and statistical literatures demonstrate make bias likely. When those conditions mixed with potent new political forces trying to change Social Security and influence the forecasts, SSA’s actuaries hunkered down trying hard to insulate themselves from the intense political pressures. Unfortunately, this otherwise laudable resistance to undue influence, along with their ad hoc qualitative forecasting models, led them to also miss important changes in the input data such as retirees living longer lives, and drawing more benefits, than predicted by simple extrapolations. We explain that solving this problem involves using (a) removing human judgment where possible, by using formal statistical methods – via the revolution in data science and big data; (b) instituting formal structural procedures when human judgment is required – via the revolution in social psychological research; and (c) requiring transparency and data sharing to catch errors that slip through – via the revolution in data sharing & replication.


Selected Presentations

Selected Press Coverage

Frequently Asked Questions

You write that that no other institution makes fully independent forecasts? What about the Congressional Budget Office?

The Congressional Budget Office (CBO) uses the SSA’s fertility forecast as an input to its forecasting model. Before 2013, CBO also used SSA’s mortality forecasts as inputs to its model (here is why they changed).

CBO explains on page 103 of The 2014 Long-Term Budget Outlook: “CBO used projected values from the Social Security trustees for fertility rates but produced its own projections for immigration and mortality rates. Together, those projections imply a total U.S. population of 395 million in 2039, compared with 324 million today. CBO also produced its own projection of the rate at which people will qualify for Social Security’s Disability Insurance program in coming decades.”

How many ultimate rates of mortality decline does the Social Security Administration choose?

The number of ultimate rates of mortality decline has changed over time. Between 1982 and 2011, the number chosen equaled 210 (5 broad age groups x 2 sexes x 7 causes of death x 3 cost scenarios). Since 2012, SSA reduced the number of causes of death from 7 to 5, applied uniform ultimate rates of decline for males and females, and uniformly scale the ultimate rates of decline for the low and high cost as 1/2 and 5/3 of the set of intermediate cost rates of decline.

How do you measure uncertainty of SSA policy scores?

As an analogy, we can think of policy scores as the coefficient (an intended causal effect) in a regression of a policy output (such as the balance or cost rate) on the treatment variable (whether or not the proposed policy is adopted) plus an error term. SSA offers no uncertainty estimates for this estimated causal effect, although of course some causal effects are likely to be better estimated or better known than others. Sometimes by known ex ante assumptions we may think the effects are known with a high degree of certainty. However, causal effects are never observed in the real world; only the policy outputs are ever observed. To empirically estimate what will happen in the real world if a policy is adopted, or to evaluate a claim about a causal effect’s size or its uncertainty in a way that makes oneself vulnerable to being proven wrong, we must rely on forecasts under present law and forecasts under the counterfactual condition of the policy being adopted. It is the uncertainty of the forecast under present law that our papers show how to estimate using the observed forecast errors. In this evaluation, we find that most of what could be observable from the impact of the causal effects are swamped by these uncertainty estimates. For example, the most recent SSA evaluation of a policy proposal gives a graphic illustration in Figure 1 which plots the point estimate of the Trust Fund Ratio for each year in the future, under both present law and a proposed law under consideration; each of these lines has uncertainty at least as large as we estimate in our paper. There is also additional uncertainty, over and above forecast errors, because we do not know exactly what would happen if the policy were actually changed, and how all the workers, beneficiaries, government officials, and others would respond under the new regime.

When is it acceptable for the Social Security Administration to bias today’s forecast towards yesterday’s forecast, producing an artificially smooth forecast?

Smoothing in this way can be advantageous statistically to reduce variance, and possibly mean square error if there exists no systematic bias. Unfortunately, SSA forecasts are systematically biased and so smoothing is not helpful here. Another possibility is to protect the public so that it does not worry about the future of Social Security. Whether this paternalistic position is appropriate is a normative choice of course. Our own view is that, whenever possible, the government should be in the position of giving accurate forecasts and telling the public the truth as soon as they know it. The government can and should accompany point estimates with appropriately confident uncertainty estimates. If public officials or the public do not understand these uncertainty estimates, then it is incumbent upon government officials, and those of us who pay attention to what they do, to be good teachers. Politicians and the public may not have the time to deal with the details very often, but in our experience it is not difficult to convey important points like these.

How soon could SSA become aware of errors in their forecasts?

For all the financial indicators, the error in last year’s one-year-ahead forecast is known before this year’s forecast is issued. However, SSA receives mortality data from the National Center for Health Statistics with a 2 to 4 year lag.

Why do you start your analysis in 1978 for the financial variables and 1982 for life expectancy?

Before these years, forecasts of these variables were only sporadically and irregularly reported. Moreover, SSA began reporting its current high-intermediate-low cost scenarios in 1982.

Have you evaluated forecasts from CBO or any other government agency?

No. That would be a great project which we encourage others take up.

Who did you interview and how did you select them?

We interviewed a sample of participants in the forecasting process, including those who try to influence the process, use the forecasts, make proposals to change the Social Security, and comment publicly or privately on the process. Our sample included current or former high and low profile public officials in Congress, the White House, and the Social Security Administration, and including Democrats, Republicans, liberals, conservatives, and those on various advisory boards. We also included some in academia and the private sector. Our design was a stratified sequential quota sample, with strata defined based on their role in the process. The sequential part of the process involved sampling and conducting interviews within each stratum until we heard the same stories and the same points sufficiently often so that we could reliably predict what the next person was going to say when prompted with the same question. We tested this hypothesis, making ourselves vulnerable to being proven wrong, by making predictions and seeing what the next person would say. Of course, each person added more color and detail and information, but at some point the information we gathered about our essential questions reached well past the point of diminishing returns and so we stopped. We found individuals by enumeration and snowball sampling techniques; we were able to find all but a few people we attempted to find, and almost everyone we asked freely gave of their time to speak with us. Part of the reason for this success in reaching people is that we promised confidentiality to each respondent; we did this whether or not they asked for it.