Performance of climate model projections


A recently published article by Zeke Hausfather et al. concludes that models have been so far quite skilful at projecting future global warming. However, this certificate of satisfaction must be put into perspective because it maintains that climate sensitivity can be observed from instrumental temperature records and also because it does not distinguish clearly the roles of models from that of scenarios used to simulate the future. Climatologists should rather try to falsify current theses than take on the mission of proving them. Here are my comments on this review article.

Article “Evaluating the performance of past climate model projections*,
Zeke Hausfather et al., AGU 2019, doi: 10.1029/2019GL085378

From the plain language summary:

“Model projections rely on two things to accurately match observations: accurate modeling of climate physics, and accurate assumptions around future emissions of CO2 and other factors affecting the climate.”

Not only is accurate modeling necessary, but it must also be comprehensive, covering all the phenomena involved. Also, the unavoidable simplification making necessary parameter tuning is a practical way to get to some results; however, this has little science in it.  A posteriori model evaluation has to distinguish between what the model calculates and how fit were the assumptions made about the future. To do the first, a model hindcast needs to be performed by running it against the actual changes of independent variables that took place during a long enough historical period. No model is required to qualify the scenario accuracy of the then projected emissions and other artificial or natural forcing; it suffices to compare actual data series with the scenario that was designed at the time. The model quality should not depend on the validity of the scenarios.

“This research should help resolve public confusion around the performance of past climate modeling efforts, and increases our confidence that models are accurately projecting global warming.”

Setting such an objective to a research project is admitting at the onset the willingness to perform what is called “advocacy research”, i.e. proving an argument rather than looking hard for falsification.

Fourteen model projections were assessed over the period of time between the date of the model publication and the end of 2017, with the IPCC’s Fourth Assessment Report (2007) having a too short projection period to draw significant conclusions. Comparison is made against five published global mean surface temperature time series (GMST). However, the validation of models does not require any future projection; hindcast should be made by re-running today the old models with the same initial conditions and tuning parameters, and applying a single scenario, that of the actual history of emissions and other causes to anthropogenic forcing. Therefore, CMIP 4 and 5 models could have been included. 

An “Implied Transient Climate Response” (TCR, a measure of short to mid-term response of GMST to the forcing associated with a doubling of the atmospheric CO2 concentration) is defined as 

In its turn, this implication implies two assumptions:

  1. that the anthropogenic forcing component is known (3.7 W m-2 for 2xCO2 ?)
    I suppose this is the primary radiative forcing due to the absorption by the greenhouse gas CO2 of the Earth’s surface infrared emission.
  2. that the climate response is linearly depending on it.
    The authors argue that it is the case because it is expected to be so with a slowly varying forcing.    

The subsequent feedbacks responding to the primary forcing are the object of the model computations from which a calculated TCR should be derived. No linearity should be assumed.

Also, to draw an implied TCR from temperature observations is not only difficult but impossible to do… without applying the same model in reverse to observed instrumental data. This methodology has quite a tautological character.

The part of GMST to be attributed to anthropogenic forcing can be explicit in the models, but not in the observed temperature time series that depend on much more variables. Thus, to speak of an “observed TCR” is not just a semantic error, it is wrong. Notwithstanding this objection, this is precisely what is done in this article.

Climate sensitivity remains one of the major stumbling blocks (if not the major one) of climate science.


In its Fig 2, actual vs calculated temperature rate of warming and “implied TCR” are shown:

Surprisingly, the range of the observed temperatures (top chart) is getting wider with more recent models (on the right of the graph); would this indicate that, during the more recent observation periods, temperature had actually a larger variability? Or is it due to some statistical artefact?

The bottom of the chart represents “model calculated TCR” vs “model calculated TCR from observed temperature series”. They overlap quite well, which is not surprising due to the tautological methodology implied.

One model that got much public attention was the 1988 assessment made by Hansen et al. It shows larger discrepancies between calculated and observed values.

The authors attribute this discrepancy to the chosen scenarios. Therefore, nothing can be said about the intrinsic model quality. Indeed, the evolution of forcing assumed in these three scenarios are given in the supplemented material. The 2017 projected values are a CO2 concentration of 408 ppm for A, 402 ppm for B, and 368 for C, while the actual value was 405 ppm. Thus, the “better” projection C was achieved with a wrongly assumed CO2 concentration. This is no quality certificate for neither the model nor the scenario.

A more critical mind would rather suggest that, if the model sensitivity would have been reduced to approx. 40% of the basic model assumption, then a correct hindcast would have been delivered.


The conclusion of the authors is one of satisfaction for the “skillfulness” of models, also earlier ones, in predicting subsequent warming in the years after publication. They also underline the mismatch of so-called high-profile models where observed forcing (CO2 emissions) were actually weaker than projected.

A non-involved reader may not reach to such conclusion.

The trick of extracting a TCR value out of observed temperature series cannot be tolerated.

The wide range of results as compared with the actual variability of the global climate system does not allow for significance; thus, what may stay within ballpark for a few decades may have no validity for extrapolation to longer periods, up to the end of the century and even beyond.

If it may be satisfactory that most ranges are overlapping, the example of the Hansen 1988 case indicates that the climate model community, or at least the authors of this article, have a hard time to make a clear distinction between true observations, model hindcast simulations that should validate them, and the plausibility of forecast scenarios to calculate possible climate projections.

There are practical reasons to not be in condition to re-run old models. But it should be reminded that, in its time, each model was tuned to match the temperature evolution then in progress; up to that point, it was designed to be accurate. The same applies for today’s models. This does not lead to their validation. If, at a later date, they would show a good match with reality, it may be a lucky strike – as in case C of Hansen 1998 – or it may stem from a true skilfulness. A mere comparison as made in this article does not prove anything and a deeper diagnosis would be required.

Furthermore, knowing that the most extreme and unlikely scenarios (e.g. RCP 8.5 in the CMIP 5 comparisons) are those that are retained to frighten everyone and to demand immediate and drastic mitigation measures, the scientific division of the Global Climate Business should avoid claiming the high quality of such approximate matches; on the contrary, they should stress their lack of confidence and the limits of their knowledge.

1 Kommentar

Bitte hinterlassen Sie Ihren Kommentar