Regression Analysis | Statistics

After having established the fact that two variables are closely related we may be interested in estimating the value of one variable given the value of another. Regression is the measure of the average relationship between two or more variables in terms of the original units of the data.

Several costs such as electricity charges, maintenance etc. vary with the volume of output though not in the same proportion. Thus when such expenses are to be estimated in a simple regression analysis, volume is taken as an independent variable and expenses as the dependent variable.


This method is also known as ‘method of least squares’. In regression analysis one variable is taken as dependent while the other as independent, thus making it possible to study the cause and effect relationship. It should be noted that the presence of association does not imply causation, but the existence of causation always implies association.

Statistical evidence can only establish the presence or absence of association between variables whether causation exists or not depends purely on reasoning. The closer the relationship between two variables, the greater the confidence that may be placed in the estimates. Variables may have either linear or non-linear relationship.

Two variables are said to have linear relationship when change in the ‘independent variable’ by one unit leads to constant absolute change in the ‘dependent variable’. When two variables have linear relationship, the regression line can be used to find out the values of dependent variable. When we plot the variables on scatter diagram, ‘line of best fit’ which pass through the plotted points, this line is called ‘regression line’.

This regression line is based on equation called ‘regression equation’ which give best estimate of one variable when the other is exactly known or given. This method may be adopted to the analysis of costs to segregate the variable and fixed elements and determine their variability or relationship to volume changes.

y = a + bx



y = Total cost

x = No. of units

a = Fixed cost

b = Variable cost per unit


Multiple Regression Analysis:

In the simple regression technique so far described, there is an assumed relationship between one dependent variable (y) and one independent variable (x). Multiple regression analysis, in contrast, involves three or more variables. There is still a dependent variable (y), but now there are two or more independent variables. Knowing how to solve a multiple regression problem, an awareness of its broad outline is necessary.

(1) As with linear regression, the total function for ‘y’ is derived from an analysis of historical data.

(2) The function for V is described by the following formula:


y = a + bx1, + cx2 + dx3 + … + exn


X1, x2, x3….xn = The various factors which affect the value of ‘y’

A = Fixed constant value

B, c, d etc. = The marginal change in the value of V caused by each particular factor

(3) The function for ‘y’ will, therefore, be impossible to draw on a two-dimensional graph, because there are three or more variables in the equation.

(4) The aim of multiple regression analysis is to improve predictions of the value of ‘y’ by recognizing that several different explaining factors might be involved, when the correlation between V and any single independent variable is not high.

The disadvantage of multiple regression analysis is its relative complexity, and a computer program would be needed to derive estimates of the ‘y’ function. However, provided that past estimates are a reliable guide to estimating for the future (i.e., if the use of historical data to predict the future is valid) multiple regression is likely to produce more accurate estimates.

With experience in its use, multiple regression analysis should prove more acceptable to supervisors than (other estimating) procedures that require gross simplification of reality. It should provide better information and its users will have more confidence in its predictions.


On analysis, the electricity costs per month in ABC Ltd. vary with the number of working days in the month, the average daily temperature outside the building during the month and the number of employees.

Using multiple regression analysis, a formula for estimating these costs per month has been derived as follows:

y = 3,000 + 95x1 – 65x2 + 1.5x3


X1 = Number of working days in the month

x2 = Average daily temperature (C)

x3 = Number of employees.

The actual electricity charges in June, 2009 were Rs. 3,000. During the month, there were 22 working days, the average daily temperature was 38 degrees Celsius and there were 500 employees.

What is the variance between actual costs and the costs that would have been expected?


Expected costs ‘y’ = 3,000 + 95(22) – 65(38) + 1.5(500) = Rs. 3,370

Expenditure variance = Rs. 3,370 – Rs. 3,000 = Rs. 370 (adv.)

Assumptions in Regression Analysis:

The following assumptions have to be made while using regression analysis:

(1) The relationship between the independent variable (x) and the dependent variable (y) is linear, a straight line. When this is not true a linear model it does not fit the data and is thereby weaker estimate of the actual relationship. The degree of linearity can be examined in the scatter-graph.

(2) The historical data points used to generate the regression line are normally distributed around the line (i.e. bell shaped) for each ‘x’ value. This assumption can be tested by drawing the regression line on the scatter-graph and determining if the coordinates fall predominately closer to the line and then become fewer as you get farther from the line for selected ‘x’ values.

(3) The dispersion of data points should be the same at the different levels of analysis of the scatter-graph which help the user visually determine the degree to which this assumption is met.

(4) The ‘y’ values should be independent of each other. For example overhead costs reported in July are not dependent on those reported in June. Users can check this assumption based upon their knowledge of the manufacturing operation of the company.

Precautions in using Regression Analysis:

Users of regression should collect as many observations of the ‘x’ and’ v’ variables as possible. Data may be examined from many short-time periods points. For example, weekly costs will yield several more observations than would monthly amounts. However, the shorter time periods are in harder to match the values of the ‘x’ and ‘y’ variables within.

The user should make sure that the dependent variables and the independent variables are matched to the proper period. If overhead cost measures are not properly related to the corresponding period of production, the actual underlying relationship will be obscured. Using longer time periods also create problems.

Obtaining observations from longer periods will require going back to many past periods where observations do not relate well to present conditions. Going further back in time runs the risk of differences due to technology changes, inflation and product modifications. Using this data can cause the cost function not to be descriptive of the product relationship between ‘x’ and ‘y’.

Regression analysis should be rigorously tested before placing a great deal of reliance on the tool. Methods of testing could include creating a model in predicting the excluded period. Another option is to use regression along with the present system of cost prediction and compare their performance.

In any case, regression analysis can be extremely useful tool for the managerial decision maker. However, like all decision models, the analysis should be used with caution and understanding of its limitations to provide optimal service.

, , ,

shopify traffic stats