Residual Sum of Squares (RSS): What It Is and How to Calculate It

Residual Sum of Squares

Investopedia / Zoe Hansen

What Is the Residual Sum of Squares (RSS)?

The residual sum of squares (RSS) is a statistical technique used to measure the amount of variance in a data set that is not explained by a regression model itself. Instead, it estimates the variance in the residuals, or error term.

Linear regression is a measurement that helps determine the strength of the relationship between a dependent variable and one or more other factors, known as independent or explanatory variables.

Key Takeaways

  • The residual sum of squares (RSS) measures the level of variance in the error term, or residuals, of a regression model.
  • The smaller the residual sum of squares, the better your model fits your data; the greater the residual sum of squares, the poorer your model fits your data. 
  • A value of zero means your model is a perfect fit.
  • Statistical models are used by investors and portfolio managers to track an investment's price and use that data to predict future movements.
  • The RSS is used by financial analysts in order to estimate the validity of their econometric models.

Understanding the Residual Sum of Squares (RSS)

In general terms, the sum of squares is a statistical technique used in regression analysis to determine the dispersion of data points. In a regression analysis, the goal is to determine how well a data series can be fitted to a function that might help to explain how the data series was generated. The sum of squares is used as a mathematical way to find the function that best fits (varies least) from the data.

The RSS measures the amount of error remaining between the regression function and the data set after the model has been run. A smaller RSS figure represents a regression function that is well-fit to the data.

The RSS, also known as the sum of squared residuals, essentially determines how well a regression model explains or represents the data in the model.

How to Calculate the Residual Sum of Squares

RSS = ni=1 (yi - f(xi))2
Where:
yi = the ith value of the variable to be predicted
f(xi) = predicted value of yi
n = upper limit of summation

Residual Sum of Squares (RSS) vs. Residual Standard Error (RSE)

The residual standard error (RSE) is another statistical term used to describe the difference in standard deviations of observed values versus predicted values as shown by points in a regression analysis. It is a goodness-of-fit measure that can be used to analyze how well a set of data points fit with the actual model.

RSE is computed by dividing the RSS by the number of observations in the sample less 2, and then taking the square root: RSE = [RSS/(n-2)]1/2

Minimizing RSS for Optimal Fit

In the realm of regression analysis, minimizing the residual sum of squares is crucial for achieving the best possible fit of a model to the data. Among the different techniques to make this happen, one of the most fundamental and widely used approaches is least squares regression.

Least squares regression is a method that aims to find the line or curve that minimizes the sum of the squared differences. These differences will be between the observed values and the values predicted by the model. In essence, the least squares regression seeks to strike a balance where the model captures the underlying trend of the data while still minimizing the discrepancies between what's been observed and what's been predicted.

The process of minimizing RSS through least squares regression involves iteratively adjusting the parameters of the model. This is usually done until the optimal fit is achieved. For a simple linear regression model, this typically entails finding the slope and intercept of the line that best fits the data. In more complex scenarios, the process becomes more intricate but has many of the same principles.

Limitations of RSS

RSS has some limitations to it. First, RSS gives equal weight to all residuals. This means that outliers can disproportionately influence the RSS, meaning that estimated coefficients may be negatively skewed. Another downside is that RSS relies on several assumptions. If any assumption such as linearity, independence of errors, or homoscedasticity are violated, RSS may lead to biased estimates and incorrect inferences.

While RSS is useful for evaluating the fit of a single model, comparing the fit across multiple models using RSS alone can be tough. This is because RSS depends on the number of parameters in the model. It isn't really meant to compare models with a different number of parameters.

Last, while RSS is easy to compute and interpret, it provides limited insight into the underlying structure of the data. In cases where understanding the relationship between predictors and the response variable is important, there may be better metrics to use. In some ways, RSS can act somewhat like a black box where the relationships aren't entirely known; only the end value is of most importance.

Special Considerations

Financial markets have increasingly become more quantitatively driven; as such, in search of an edge, many investors are using advanced statistical techniques to aid in their decisions. Big data, machine learning, and artificial intelligence applications further necessitate the use of statistical properties to guide contemporary investment strategies. The residual sum of squares—or RSS statistics—is one of many statistical properties enjoying a renaissance.

Statistical models are used by investors and portfolio managers to track an investment's price and use that data to predict future movements. The study—called regression analysis—might involve analyzing the relationship in price movements between a commodity and the stocks of companies engaged in producing the commodity.

Finding the residual sum of squares by hand can be difficult and time-consuming. Because it involves a lot of subtracting, squaring, and summing, the calculations can be prone to errors. For this reason, you may decide to use software, such as Excel, to do the calculations.

Any model might have variances between the predicted values and actual results. Although the variances might be explained by the regression analysis, the RSS represents the variances or errors that are not explained.

Since a sufficiently complex regression function can be made to closely fit virtually any data set, further study is necessary to determine whether the regression function is, in fact, useful in explaining the variance of the dataset.

Typically, however, a smaller or lower value for the RSS is ideal in any model since it means there's less variation in the data set. In other words, the lower the sum of squared residuals, the better the regression model is at explaining the data.

Example of the RSS

For a simple (but lengthy) demonstration of the RSS calculation, consider the well-known correlation between a country's consumer spending and its GDP. The following chart reflects the published values of consumer spending and Gross Domestic Product for the 27 states of the European Union. Note that this information may have slightly changed since it has been published, but the example of residual sum of squares remains valid.

Consumer Spending vs. GDP for EU Member States
Country Consumer Spending
(Millions)
GDP
(Millions)
Austria 309,018.88 433,258.47
Belgium 388,436.00 521,861.29
Bulgaria 54,647.31 69,889.35
Croatia 47,392.86 57,203.78
Cyprus 20,592.74 24,612.65
Czech Republic 164,933.47 245,349.49
Denmark 251,478.47 356,084.87
Estonia 21,776.00 30,650.29
Finland 203,731.24 269,751.31
France 2,057,126.03 2,630,317.73
Germany 2,812,718.45 3,846,413.93
Greece 174,893.21 188,835.20
Hungary 110,323.35 155,808.44
Ireland 160,561.07 425,888.95
Italy 1,486,910.44 1,888,709.44
Latvia 25,776.74 33,707.32
Lithuania 43,679.20 56,546.96
Luxembourg 35,953.29 73,353.13
Malta 9,808.76 14,647.38
Netherlands 620,050.30 913,865.40
Poland 453,186.14 596,624.36
Portugal 190,509.98 228,539.25
Romania 198,867.77 248,715.55
Slovak Republic 83,845.27 105,172.56
Slovenia 37,929.24 53,589.61
Spain 997,452.45 1,281,484.64
Sweden 382,240.92 541,220.06
World Bank

Consumer spending and GDP have a strong positive correlation, and it is possible to predict a country's GDP based on consumer spending (CS). Using the formula for a best fit line, this relationship can be approximated as:

GDP = 1.3232 x CS + 10447

The units for both GDP and Consumer Spending are in millions of U.S. dollars.

This formula is highly accurate for most purposes, but it is not perfect, due to the individual variations in each country's economy. The following chart compares the projected GDP of each country, based on the formula above, and the actual GDP as recorded by the World Bank.

Projected and Actual GDP Figures for EU Member States, and Residual Squares
Country Consumer Spending Most Recent Value (Millions) GDP Most Recent Value (Millions) Projected GDP (Based on Trendline) Residual Square (Projected - Real)^2
Austria 309,018.88 433,258.47 419,340.782016 193,702,038.819978
Belgium 388,436.00 521,861.29 524,425.52 6,575,250.87631504
Bulgaria 54,647.31 69,889.35 82,756.320592 165,558,932.215393
Croatia 47,392.86 57,203.78 73,157.232352 254,512,641.947534
Cyprus 20,592.74 24,612.65 37,695.313568 171,156,086.033474
Czech Republic 164,933.47 245,349.49 228,686.967504 277,639,655.929706
Denmark 251,478.47 356,084.87 343,203.311504 165,934,549.28587
Estonia 21,776.00 30,650.29 39,261.00 74,144,381.8126542
Finland 203,731.24 269,751.31 280,024.176768 105,531,791.633079
France 2,057,126.03 2,630,317.73 2,732,436.162896 10,428,174,337.1349
Germany 2,812,718.45 3,846,413.93 3,732,236.05304 13,036,587,587.0929
Greece 174,893.21 188,835.20 241,865.695472 2,812,233,450.00581
Hungary 110,323.35 155,808.44 156,426.85672 382,439.239575558
Ireland 160,561.07 425,888.95 222,901.407824 41,203,942,278.6534
Italy 1,486,910.44 1,888,709.44 1,977,926.894208 7,959,754,135.35658
Latvia 25,776.74 33,707.32 44,554.782368 117,667,439.825176
Lithuania 43,679.20 56,546.96 68,243.32 136,804,777.364243
Luxembourg 35,953.29 73,353.13 58,020.393328 235,092,813.852894
Malta 9,808.76 14,647.38 23,425.951232 77,063,312.875298
Netherlands 620,050.30 913,865.40 830,897.56 6,883,662,978.71
Poland 453,186.14 596,624.36 610,102.900448 181,671,052.608372
Portugal 190,509.98 228,539.25 262,529.805536 1,155,357,865.6459
Romania 198,867.77 248,715.55 273,588.833264 618,680,220.331183
Slovak Republic 83,845.27 105,172.56 121,391.061264 263,039,783.25037
Slovenia 37,929.24 53,589.61 60,634.970368 49,637,102.7149851
Spain 997,452.45 1,281,484.64 1,330,276.08184 2,380,604,796.8261
Sweden 382,240.92 541,220.06 516,228.185344 624,593,798.821215
World Bank

The column on the right indicates the residual squares–the squared difference between each projected value and its actual value. The numbers appear large, but their sum is actually lower than the RSS for any other possible trendline. If a different line had a lower RSS for these data points, that line would be the best fit line.

Is the Residual Sum of Squares the Same as R-Squared?

The residual sum of squares (RSS) is the absolute amount of explained variation, whereas R-squared is the absolute amount of variation as a proportion of total variation.

Is RSS the Same as the Sum of Squared Estimate of Errors (SSE)?

The residual sum of squares (RSS) is also known as the sum of squared estimate of errors (SSE).

What Is the Difference Between the Residual Sum of Squares and Total Sum of Squares?

The total sum of squares (TSS) measures how much variation there is in the observed data, while the residual sum of squares measures the variation in the error between the observed data and modeled values. In statistics, the values for the residual sum of squares and the total sum of squares (TSS) are oftentimes compared to each other.

Can a Residual Sum of Squares Be Zero?

The residual sum of squares can be zero. The smaller the residual sum of squares, the better your model fits your data; the greater the residual sum of squares, the poorer your model fits your data. A value of zero means your model is a perfect fit.

The Bottom Line

Residual sum of squares quantifies the discrepancy between observed data points and the predictions made by a regression model, calculated as the sum of the squared residuals. Minimizing RSS is a fundamental objective in regression analysis, as it represents the degree to which the model accurately captures the variability in the data.

Article Sources
Investopedia requires writers to use primary sources to support their work. These include white papers, government data, original reporting, and interviews with industry experts. We also reference original research from other reputable publishers where appropriate. You can learn more about the standards we follow in producing accurate, unbiased content in our editorial policy.
  1. World Bank. "GDP (Current US$) – European Union."

  2. World Bank. "Final Consumption Expenditure (Current $) – European Union."

Compare Accounts
×
The offers that appear in this table are from partnerships from which Investopedia receives compensation. This compensation may impact how and where listings appear. Investopedia does not include all offers available in the marketplace.
Provider
Name
Description