A recent paper employed expert statistical analysis to prove that currently climate models fail to reproduce fluctuations of sea surface temperatures in the North Atlantic, a key region affecting global weather and climate. H/T to David Whitehouse at GWPF for posting a revew of the paper. I agree with him that the analysis looks solid and the findings robust. However, as I will show below, neither Whitehouse nor the paper explicitly drew the most important implication.
At GWPF, Whitehouse writes Climate models fail in key test region (in italics with my bolds):
A new paper by Timothy DelSole of George Mason University and Michael Tippett of Columbia University looks into this by attempting to quantify the consistency between climate models and observations using a novel statistical approach. It involves using a multivariate statistical framework whose usefulness has been demonstrated in other fields such as economics and statistics. Technically, they are asking if two time series such as observations and climate model output come from the same statistical source.
To do this they looked at the surface temperature of the North Atlantic which is variable over decadal timescales. The reason for this variability is disputed, it could be related to human-induced climate change or natural variability. If it is internal variability but falsely accredited to human influences then it could lead over estimates of climate sensitivity. There is also the view that the variability is due to anthropogenic aerosols with internal variability playing a weak role but it has been found that models that use external forcing produce inconsistencies in such things as the pattern of temperature and ocean salinity. These things considered it’s important to investigate if climate models are doing well in accounting for variability in the region as the North Atlantic is often used as a test of a climate model’s capability.
The researchers found that when compared to observations, almost every CMIP5 model fails, no matter whether the multidecadal variability is assumed to be forced or internal. They also found institutional bias in that output from the same model, or from models from the same institution, tended to be clustered together, and in many cases differ significantly from other clusters produced by other institutions. Overall only a few climate models out of three dozen considered were found to be consistent with the observations.
The paper is Comparing Climate Time Series. Part II: A Multivariate Test by DelSole and Tippett. Excerpts in italics with my bolds.
We now apply our test to compare North Atlantic sea surface temperature (NASST) variability between models and observations. In particular, we focus on comparing multi-year internal variability. The question arises as to how to extract internal variability from observations. There is considerable debate about the magnitude of forced variability in this region, particularly the contribution due to anthropogenic aerosols (Booth et al., 2012; Zhang et al., 2013). Accordingly, we consider two possibilities: that the forced response is well represented by (1) a second-order polynomial or (2) a ninth-order polynomial over 1854-2018. These two assumptions will be justified shortly.
If NASST were represented on a typical 1◦ × 1◦ grid, then the number of grid cells would far exceed the available sample size. Accordingly, some form of dimension reduction is necessary. Given our focus on multi-year predictability, we consider only large-scale patterns. Accordingly, we project annual-mean NASST onto the leading eigenvectors of the Laplacian over the Atlantic between 0 0 60◦N. These eigenvectors form an orthogonal set of patterns that can be ordered by a measure of length scale from largest to smallest.
Figure 1. Laplacian eigenvectors 1,2,3,4,5,6 over the North Atlantic between the equator and 60◦N, where dark red and dark blue indicate extreme positive and negative values, respectively
The first six Laplacian eigenvectors are shown in fig. 1 (these were computed by the method of DelSole and Tippett, 2015). The first eigenvector is spatially uniform. Projecting data onto the first Laplacian eigenvector is equivalent to taking the area-weighted average in the basin. In the case of SST, the time series for the first Laplacian eigenvector is merely an AMV index (AMV stands for “Atlantic Multidecadal Variability”). The second and third eigenvectors are dipoles that measure the large-scale gradient across the basin. Subsequent eigenvectors capture smaller scale patterns. For model data, we use pre-industrial control simulations of SST from phase 5 of the Coupled Model Intercomparison Project (CMIP5 Taylor et al., 2012). Control simulations use forcings that repeat year after year. As a result, interannual variability in control simulations come from internal dynamical mechanisms, not from external forcing.
Figure 2. AMV index from ERSSTv5 (thin grey), and polynomial fits to a second-order (thick black) and ninth-order (red) polynomial.
For observational data, we use version 5 of the Extended Reconstructed SST dataset (ERSSTv5 Huang et al., 2017). We consider only the 165-year period 1854-2018. We first focus on time series for the first Laplacian eigenvector, which we call the AMV index. The corresponding least squares fit to second- and ninth-order polynomials in time are shown in fig. 2. The second-order polynomial captures the secular trend toward warmer temperatures but otherwise has weak multidecadal variability. In contrast, the ninth-order polynomial captures both the secular trend and multidecadal variability. There is no consensus as to whether this multidecadal variability is internal or forced.
Figure 4. Deviance between ERSSTv5 1854-1935 and 82-year segments from 36 CMIP5 pre-industrial control simulations. Also shown is the deviance between ERSSTv5 1854-1935 and ERSSTv5 1937-2018 (first item on x-axis). The black and red curves show, respectively, results after removing a second- and ninth-order polynomial in time over 1854-2018 before evaluating the deviance. The models have been ordered on the x-axis from smallest to largest deviance after removing a second-order polynomial in time.
The test was illustrated by using it to compare annual mean North Atlantic SST variability in models and observations. When compared to observations, almost every CMIP5 model differs significantly from ERSST. This conclusion holds regardless of whether a second- or ninth-order polynomial in time is regressed out. Thus, our conclusion does not depend on whether multidecadal NASST variability is assumed to be forced or internal. By applying a hierarchical clustering technique, we showed that time series from the same model, or from models from the same institution, tend to be clustered together, and in many cases differ significantly from other clusters. Our results are consistent with previous claims (Pennell and Reichler, 2011; Knutti et al., 2013) that the effective number of independent models is smaller than the actual number of models in a multi-model ensemble.
The Elephant in the Room
Now let’s consider the interpretation reached by model builders after failing to match observations of Atlantic Multidecadal Variability. As an example consider INMCM4, whose results deviated greatly from the ERSST5 dataset. In 2018, Evgeny Volodin and Andrey Gritsun published Simulation of observed climate changes in 1850–2014 with climate model INM-CM5. Included in those simulations is a report of their attempts to replicate North Atlantic SSTs. Excerpts in italics with my bolds.
Figure 4 The 5-year mean AMO index (K) for ERSSTv4 data (thick solid black); model mean (thick solid red). Dashed thin lines represent data from individual model runs. Colors correspond to individual runs as in Fig. 1.
Keeping in mind the argument that the GMST slowdown in the beginning of the 21st century could be due to the internal variability of the climate system, let us look at the behavior of the AMO and PDO climate indices. Here we calculated the AMO index in the usual way, as the SST anomaly in the Atlantic at latitudinal band 0–60∘ N minus the anomaly of the GMST. The model and observed 5-year mean AMO index time series are presented in Fig. 4. The well-known oscillation with a period of 60–70 years can be clearly seen in the observations. Among the model runs, only one (dashed purple line) shows oscillation with a period of about 70 years, but without significant maximum near year 2000. In other model runs there is no distinct oscillation with a period of 60–70 years but a period of 20–40 years prevails. As a result none of the seven model trajectories reproduces the behavior of the observed AMO index after year 1950 (including its warm phase at the turn of the 20th and 21st centuries).
One can conclude that anthropogenic forcing is unable to produce any significant impact on the AMO dynamics as its index averaged over seven realization stays around zero within one sigma interval (0.08). Consequently, the AMO dynamics are controlled by the internal variability of the climate system and cannot be predicted in historic experiments. On the other hand, the model can correctly predict GMST changes in 1980–2014 having the wrong phase of the AMO (blue, yellow, orange lines in Figs. 1 and 4).
Figure 1 The 5-year mean GMST (K) anomaly with respect to 1850–1899 for HadCRUTv4 (thick solid black); model mean (thick solid red). Dashed thin lines represent data from individual model runs: 1 – purple, 2 – dark blue, 3 – blue, 4 – green, 5 – yellow, 6 – orange, 7 – magenta. In this and the next figures numbers on the time axis indicate the first year of the 5-year mean.
The Bottom Line
Since the models incorporate AGW in the form of CO2 sensitivity, they are unable to replicate Atlantic Multidecadal Variability. Thus, the logical conclusion is that variability of North Atlantic SSTs is an internal, natural climate factor.