Fitting Sunspot Numbers and CO2 Concentrations to Temperature Anomalies

In this post, I give a simple model for temperature anomaly for data from 1850 to 2013.  The temperature anomaly data is the land and ocean series from Berkeley Earth; the sunspot data is the old monthly averages of daily sunspot numbers from the Royal Observatory in Belgium; and the CO2 data come from two series of the NOAA ESRL, one series – using ice cores – is of yearly CO2 levels from 1850 to 1958 and the other of monthly levels from the Mauna Loa Observatory in Hawaii from March of 1958 to 2013.  I created a time series out of the two CO2 series by adding the seasonal components to each yearly point in the earlier series, where the seasonal components are found using the later series.  I then filtered the combined series with a linear filter 129 months wide.  I also tried a filter width of 12 months, but the results were essentially the same.  I smoothed the CO2 data because the seasonal variation for the CO2 data is local to Hawaii and the other data is global..  The first sunspot observation is associated with the fourth temperature anomaly observation and the fourth CO2 observation, since I have observed that temperature anomaly is lagged three months behind sunspot number, in terms of correlation.

Fit of Temperature Anomaly to SDunspot numbers and CO2 Concentrations 1850 to 2013

Together, the two variables, sunspots and CO2 account for about 70% of the variation in temperature anomaly, where variation is measured by sums of squares. The regression indicates an increase of 0.045 degrees C for every 100 more sunspots and an increase of 0.92 degree C for every 100 ppm increase in the level of CO2.  The fit does agree with the last ten years of the hiatus in temperature growth (Actually it does not.  I thought it did.)  .

While the fit is not perfect to the linear regression assumptions, the fit is not too bad.  The diagnostic plots from R are given below.

Diagnostic Plots

This is a very simple model, and does not account for other greenhouse gases or other component of the atmosphere, or the composition of the different places on the earth. That said, temperature is a measure of energy, they are linearly related.  So using sunspots which, for the data we have, are a good proxy for energy density from the sun, makes sense.  I do not know enough physics  to say if the energy returned to the earth by greenhouse gases is linearly related to greenhouse gas concentration, but the plots indicate probably yes.

Corrected Plots for Filtered Data and Phases for the 1850 to 2015 Data and the 1750 to 2015 Data

When I filtered the sunspot and temperature data, I used a bandwidth of 257 months rather than 129 months.  The corrected plots are below. filtered correctly

Once again, the sunspot data is the mean of the daily sunspot numbers over each month from the Royal Observatory in Belgium and the temperature anomaly data is from Berkeley Earth and is the land and ocean series.

Here are the plots for the new Belgian series and the Berkeley Earth temperature anomaly series for land data, which goes back to 1750.

The contrast between the two sets of series may be because the temperature anomaly data in the first set is land and ocean data, while the temperature data in the second – longer – set is just land data.  The radiation from the ocean is much more uniform than the radiation from  land.  Also, I do not think we are certain that the observed fluctuation in the solar radiation which has been seen to follow the sunspot cycle during the time that we have had satellite measurements – since 1978 I believe – actually followed sunspot numbers closely over the past.

Phase Plots between Centered Sun Spot Numbers and Centered Temperature Anomalies

In this post, I have two sets of plots of the phase between centered sunspot numbers and centered temperature anomalies, where the centering is done by subtracting the 128 month linearly filtered data from the original time series.  Also plotted is the phase divided by the frequency – or tau.  I apologize to anyone I confused by my earlier uncentered plots.

If the slope of the phase plot is constant, then there is a constant lag between the two plots.  The sign of the slope gives which series precedes the other.  In my plots a negative slope indicates that the second series lags behind the first.  (See Introduction to Statistical Time Series, W.A. Fuller, 1976, Wiley, pp. 152-153 and pp.308-324.)

The first set of plots is of the average of the daily sunspot numbers, averaged over each month, from WDC-SILSO at the Royal Observatory of Belgium, Brussels,,   where the data is the previous data from the site (the data changed in July of 2015) and of the temperature anomaly from Berkeley Earth of monthly global average temperature data for land and ocean,    The temperature series goes from 1851 to 2014.  There appears to be a lag of about four or five months from the sunspot incidence to changes in the temperature anomaly.

The second set of plots uses the current sunspot data from Belgium and the land temperature anomaly from Berkeley Earth.  The Berkeley Earth data goes back to 1750, the Belgian data to 1749.  There does not appears to be a lag from the sunspot incidence to the temperature anomaly.

The confidence intervals should be taken with a grain of salt, since the sunspot series is not a stationary time series.

Plots of Sun Spot Average Numbers and Temperature Anomaly mid 1700’s to 2015 – New Data

Here are new plots for sun spot number and temperature anomaly, going back to 1749 for the sun spots and to 1750 for temperature anomalies.  The sun spot data is a revised series of monthly averages of daily sun spot numbers from SIDC in Belgium and the temperature data is from Berkeley and for land temperatures.

 Plot of raw sun spot averages and temperature anomalies - 1750 to 2015

plot of filtered sunspots and temperature anomalies - 1750 to 2015.

The temperature data is a lot noisier before 1850. The filter is an 128 month linear filter, where 128 months is the estimated length of the sun spot cycle.

Sunspot and Temperature Anomaly Data and their Bivariate Phase Spectrum

I start this blog post with a plot of the sunspot and temperature anomaly data that I have been using.  The data is from 1850 to 2013.

raw sunspot and temperature anomaly plots 1850 to 2013

The following plots are of the phase spectrum between the sunspot time series and the temperature anomaly time series.  Also, the variable tau is plotted next to the spectrum.  Tau would give the slope of the phase spectrum curve if the intercept were zero for a line fitting a stretch of curve.  Tau is the phase spectrum divided by the frequency.  The first set of plots uses all of the data.  The second set of plots just uses the first 250 points.  There are 1968 months in each time series.


From the plot, with frequencies up to around 275 cycles per 1968 periods (1968 periods divided by 275 cycles equals about 7 periods per cycle) have  mainly an increasing  phase relationship.   From about 300 cycles per 1968 periods to 800 cycles per 1968 periods (about 7 periods per cycle to about 2.5 periods per cycle), the slope is mainly decreasing.  For frequencies greater than 800 cycles per 1968 periods, the mainly slope is flat.

Constant positive slopes indicate that sunspots precede temperature anomaly with a constant lag.  Constant negative slopes indicate than temperature anomaly precedes sunspots with a constant lag.  A flat slope indicates no lagged relationship.

phase spectrum and tau for 250 frequencies

Looking at the plots of tau above, out to around 22 cycles per 1986 periods (about 90 periods per cycle), tau was changing quite a bit.  Out beyond around 250 cycles per 1986 periods (about 7 periods per cycle), tau does not change much.

Running Averages for a Sunspot Period of 129 Months and a Model Fit for the First 69 Years 7 Months

I have found the source of my temperature anomaly data – at Berkeley Earth – the data is the monthly global average temperature data for land and ocean,  It took several hours of searching.

In my first graph of this blog I plot the graph of the last blog using a period of 129 months for the sunspot cycle.  To choose the period, I used the F test from Fuller’s Introduction to Time Series, 1976, Wiley, p.282 to compare different periods close to 132.  The period of 129 had the largest F value.  The period is 10 years and 9 months.

The second graph I put up is of two plots.  The first plot is of the temperature anomaly shown above along with a curve of predicted values, where the predictions are from the model generated by the regression of the 88th through 922th observations of the temperature anomaly vector on the 1st through the 835th observations of the sunspot vector.  The model has the largest R squared of the several hundred single variable lag models at which I looked.  The best fitting model was for a lag of 87 months fitting 835 observations.


The second plot is the difference between the two curves in the first plot.

This is a very simple model.  I was surprised by the drop in the differences in the 1950’s and early 1960’s, but the trend is of an increasing difference as time goes on.

Running Averages of Global Average Temperature and Sunspot Numbers

In this blog post, I have uploaded  a graph of the running average over 132 months (11 years) of the average number of daily sunspots per month as found by the Royal Observatory of Belgium Av. Circulaire, 3 – B-1180 Brussels, Belgium and the running average over 132 months of the global average temperature anomaly taken from a source I have lost and cannot find.  I believe the temperature anomalies are for air temperature over the ocean.  The years are 1850 to 2013.  I created the graph.

Time series temperature anomaly and sunspot numbers

From 1855 to 1900, it appears that changes in the temperature anomaly follow the sunspot numbers by a year or two.  And temperatures loosely increased as sunspots increased but not like from 1855 to 1900.  By the average at 2008 there was no connection.

The source of just about all heat coming into this planet is the sun.  Sunspots appear to be related to the solar flux, more sunspots correlate with a denser flow of energy, so one would expect the temperature of the earth to correlate with the sunspots.

I tried to take out the cycle of the sun spots and focus on the level of sunspot activity by taking a running average of the observations over the approximately 11 year sunspot cycle.  The averages run from July of 1855 to June of 2008.

The plots indicate to me that there is more to global temperature than the sunspot cycle over the years for which we have data.

On Diverging Increasing Time Series

In the second to the last blog post, I commented that the divergence of the time series of the mean income from the median income indicated that the distribution of income is becoming more skewed toward the wealthy.  However if one time series is a constant proportion of another, the two series will diverge as they increase, with the difference between the series increasing at a constant rate, also.  See the figure below.

constant proportion ts

The difference divided by the original time series is a constant, one minus the proportion that the second time series is to the first, in this case 0.10.

Below, I have plotted the difference between mean income and median income, with the difference divided by the mean income.  The years are 1947 to 2013 and the income was measured in 2013 dollars.

prop dif income mn md 47to13

From the income time series, the proportionate difference has an increasing trend, rather than being constant, indicating to me that the distribution is indeed becoming more skewed over time.  The biggest increase was from 1952 to 1961, all but two of which were Eisenhower years.

The data are from Table P. 4, at .

US Debt and GDP

In this blog post, I give the level of debt outstanding and the gross domestic product of the US  for the years 1940 to 2012.

gdp.and.debt 40to12

You can see that by 2012, debt outstanding was up to the level of gross domestic product.  What is interesting is that back in the 1940’s the same was true.

Debt has increased with gross domestic product over the years.  Note that this plot is in current dollars, not constant dollars, so some of the growth has been inflation.  You can see clearly that during Clinton’s administration the increase in the debt slowed down to pretty much a stop and that since 2008, the debt has grown at a faster pace than the gross domestic product.  The large pickup in the growth of the debt started with Reagan’s administration, after the “stagflation” of Carter’s years.

Federal revenues mostly come from taxes, such as income, payroll, or corporate taxes, which decrease when the economy slows down.  At the same time, when the economy slows down, demand for government services increases.   So, we see the debt increasing quickly over the last several years since the economy has been slow in recovering from the 2008 crash.

The source of the gross domestic product series is the Bureau of Economic Activity under the Department of Commerce,  The source of the data on debt outstanding is the US Treasury, at

Some Time Series of US Personal Income

In this post, I give some time series taken from the Census Bureau website.

First, from Table P. 5, at, I give median personal income for persons 14 or 15 years old (with the age depending on the year of the data points) or older.  The data are in 2013 and current dollars and the years of data go from 1947 to 2013.

personal income by sex

The plot shows that the persistent gains in median personal income over the last 45 years have gone to females, but that females are still well below males in terms of personal income.

From the same website, Table P. 7, mean and median income in current and 2013 dollars are given for the years 1974 to 2013.  The ages covered are the same as with the first plot.  The plot shows that mean and median income are diverging.

median and mean income 74to13

The divergence indicates that the distribution of income is becoming more skewed toward the wealthy, as has been talked about in the press and by many politicians.