Models Wrong About the Past Produce Unbelievable Futures

Models vs. Observations. Christy and McKitrick (2018) Figure 3

The title of this post is the theme driven home by Patrick J. Michaels in his critique of the most recent US National Climate Assessment (NA4). The failure of General Circulation Models (GCMs) is the focal point of his presentation February 14, 2018. Comments on the Fourth National Climate Assessment. Excerpts in italics with my bolds.

NA4 uses a flawed ensemble of models that dramatically overforecast warming of the lower troposphere, with even larger errors in the upper tropical troposphere. The model ensemble also could not accommodate the “pause” or “slowdown” in warming between the two large El Niños of 1997-8 and 2015-6. The distribution of warming rates within the CMIP5 ensemble is not a true indication of a statistical range of prospective warming, as it is a collection of systematic errors. Despite a glib statement about this Assessment fulfilling the terms of the federal Data Quality Act, that is fatuous. The use of systematically failing models does not fulfill the “maximizing the quality, objectivity, utility, and integrity of information” provision of the Act.

USGCRP should produce a reset Assessment, relying on a model or models that work in four dimensions for future guidance and ignoring the ones that don’t.

Why wasn’t this done to begin with? The model INM-CM4 is spot on, both at the surface and in the vertical, but using it would have largely meant the end of warming as a significant issue. Under a realistic emission scenario (which USGCRP also did not use), INM-CM4 strongly supports the “lukewarm” synthesis of global warming. Given the culture of alarmism that has infected the global change community since before the first (2000) Assessment, using this model would have been a complete turnaround with serious implications.

The new Assessment should employ best scientific practice, and one that weather forecasters use every day. In the climate sphere, billions of dollars are at stake, and reliable forecasts are also critical.

The theme is now picked up in the latest NIPCC report on Fossil Fuels. Chapter 2 is the Climate Science background and the statements below in italics with my bolds come from there.

Chapter 2 Climate Science Climate Change Reconsidered II: Fossil Fuels

Of the 102 model runs considered by Christy and McKitrick, only one comes close to accurately hindcasting temperatures since 1979: the INM-CM4 model produced by the Institute for Numerical Mathematics of the Russian Academy of Sciences (Volodin and Gritsun, 2018). That model projects only 1.4°C warming by the end of the century, similar to the forecast made by the Nongovernmental International Panel on Climate Change (NIPCC, 2013) and many scientists, a warming only one-third as much as the IPCC forecasts. Commenting on the success of the INM-CM model compared to the others (as shown in an earlier version of the Christy graphic), Clutz (2015) writes,

(1) INM-CM4 has the lowest CO2 forcing response at 4.1K for 4xCO2. That is 37% lower than multi-model mean.

(2) INM-CM4 has by far the highest climate system inertia: Deep ocean heat capacity in INM-CM4 is 317 W yr m-2 K -1 , 200% of the mean (which excluded INM-CM4 because it was such an outlier).

(3)INM-CM4 exactly matches observed atmospheric H2O content in lower troposphere (215 hPa), and is biased low above that. Most others are biased high.

So the model that most closely reproduces the temperature history has high inertia from ocean heat capacities, low forcing from CO2 and less water for feedback. Why aren’t the other models built like this one?

The outputs of GCMs are only as reliable as the data and theories “fed” into them, which scientists widely recognize as being seriously deficient (Bray and von Storch, 2016; Strengers, et al., 2015). The utility and skillfulness of computer models are dependent on how well the processes they model are understood, how faithfully those processes are simulated in the computer code, and whether the results can be repeatedly tested so the models can be refined (Loehle, 2018). To date, GCMs have failed to deliver on each of these counts.

The reference above is to a study published in July 2018 by John Christy and Ross McKitrick  A Test of the Tropical 200‐ to 300‐hPa Warming Rate in Climate Models. Excerpts in italics with my bolds.


Overall climate sensitivity to CO2 doubling in a general circulation model results from a complex system of parameterizations in combination with the underlying model structure. We refer to this as the model’s major hypothesis, and we assume it to be testable. We explain four criteria that a valid test should meet: measurability, specificity, independence, and uniqueness. We argue that temperature change in the tropical 200‐ to 300‐hPa layer meets these criteria. Comparing modeled to observed trends over the past 60 years using a persistence‐robust variance estimator shows that all models warm more rapidly than observations and in the majority of individual cases the discrepancy is statistically significant. We argue that this provides informative evidence against the major hypothesis in most current climate models.


All series‐specific trends and confidence intervals are reported in the supporting information Table S1. The mean restricted trend (without a break term) is 0.325 ± 0.132°C per decade in the models and 0.173 ± 0.056°C per decade in the observations. With a break term included they are 0.389 ± 0.173°C per decade (models) and 0.142 ± 0.115°C per decade (observed). Figure 4 shows the individual trend magnitudes. The red circles and confidence interval whiskers are from models, and the blue are observed.  Trend magnitudes and 95% confidence intervals. Number in upper left corner indicates number of model trends (out of 102) that exceed observed average trend.

If models accurately represented the magnitude of 200‐ to 300‐hPa warming with only nonsystematic errors contributing noise, these distributions would be centered on zero. Clearly, they are centered above zero, in fact in both the restricted and general cases, the entire distribution is above zero.

Table S2 presents individual run test results. In the restricted case, 62 of the 102 divergence terms are significant, while in the general case, 87 of 102 are. The model‐observational discrepancy is not simple uncertainty or random noise but represents a structural bias shared across models.

Worst and Best Models (Table S2) No Break With Break
bcc‐csm1‐1 220.1 593.3
CanESM2 410.3 534.4
CCSM4 258.1 430.6
EC‐EARTH 296.0 222.5
FIO‐ESM 129.2 310.9
GISS‐E2‐H 157.3 444.8
GISS‐E2‐H‐CC 139.0 468.5
GISS‐E2‐R 382.4 237.7
HadGEM2‐ES 50.0 575.4
INMCM4 0.0 2.9

Note. First column: test score for restricted case (no break). Score is significant at 5% if it exceeds 41.53. Second column: test score for unrestricted case (with break at 1979). Score is significant at 5% if it exceeds 50.48.


Comparing observed trends to those predicted by models over the past 60 years reveals a clear and significant tendency on the part of models to overstate warming. All 102 CMIP5 model runs warm faster than observations, in most individual cases the discrepancy is significant, and on average the discrepancy is significant. The test of trend equivalence rejects whether or not we include a break at 1979 for the PCS, though the rejections are stronger when we control for its influence. Measures of series divergence are centered at a positive mean and the entire distribution is above zero. While the observed analogue exhibits a warming trend over the test interval it is significantly smaller than that shown in models, and the difference is large enough to reject the null hypothesis that models represent it correctly, within the bounds of random uncertainty.


The reference to Clutz (2015) is the post Temperatures According to Climate Models

See also: 2018 Update: Best Climate Model INMCM5


  1. Bob Greene · May 24, 2019

    Good article. If you mention model failures the experts will tell you that the models predict with dead on accuracy. These are the same ones who tell me that the model outputs are data, same as measurements.


    • Ron Clutz · May 24, 2019

      And a run of the computer program is considered an “experiment” as though it were a laboratory procedure.

      Liked by 1 person

      • Bob Greene · May 24, 2019

        I have no problem with running those computer program “experiments”. It’s the faith put in those unvalidated experiments. I developed math/computer models of every chemical process I used or developed when I was in industry. Much less mess to clean up when your “what-if” experiment failed in the computer. We doubled the output of two complex processes based on fiddling with models. All my environmental air permits were based on models.

        The one overriding characteristics of those models is that they had to be validated against real world results. If not, the model went to electron heaven. Climate modelers are not under the constraints of having to explain why your great idea didn’t work.

        Liked by 1 person

  2. Hifast · May 24, 2019

    Reblogged this on Climate Collections.


  3. uwe.roland.gross · May 26, 2019

    Reblogged this on Climate- Science.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s