Coefficient of Correlation (With Diagram)

In this article we will discuss about the coefficient of correlation.

A cost line, and thus cost behaviour pattern can be established using the various methods above from any set of data. Whether the values obtained for unit variable cost and total fixed cost are of practical use depends upon whether there is a clear causal relationship and sufficient evidence of correlation.


Whilst it is reasonable to assume that there is a causal relationship between activity and cost, the question is whether it is as straight forward as that suggested by linear relationship and by the particular cost function calculated. Even if there is a linear relationship the values produced by the methods illustrated in this section, including the least-squares regression, may be wrong due to any unrepresentative data used.

In regression analysis the squaring process gives considerable prominence to large fluctuations. The cost function determined by regression analysis can be tested to establish how well the estimated relationship explains the variation in the observations.

The coefficient of correlation, ‘r’ measures the extent to which the output variable explains the changes in total costs, and may be measured by calculating correlation coefficient and coefficient of determination. Some of the examples of statistical relationship of data is shown in figure 2.5.

Illustrative Statistical Relationships

The degree of correlation between two variables can be measured and we can decide, using actual results or pairs of data, whether two variables are perfectly or partially correlated.

If they are partially correlated, we can determine whether there is a high or low degree of partial correlation. This degree of correlation is measured by the correlation coefficient. There are several formulae for ascertainment of correlation coefficient, although each formula should give the same value.


The following is the standard formula used in measuring correlation coefficient:

Where ‘x’ and ‘y’ represent pairs of data for two variables and ‘n’ is the number of pairs of data for two variables. The above formula is used in subsequent examples in ascertaining correlation coefficient. The value of’ r’ must always fall between -1 and +1. If you get a value outside this range you have made a mistake.

r = +1 means that the variables are perfectly positively correlated.


r = -1 means that the variables are perfectly negatively correlated.

r = 0 means that the variables are un-correlated.

Illustration 2:

Sales of product A between 2005 and 2009 were as follows:

Is there a trend in sales? In other words, is there any correlation between the year and the number of units sold?


There is partial negative correlation between the year of sale and units sold. The value of ‘r’ is close to -1, therefore, a high degree of correlation exists, although it is not quite perfect correlation. This means that there is a clear downward trend in sales, which is close to being a straight downward trend line.

Coefficient of Determination:

Unless the correlation coefficient V is exactly or very nearly +1, -1 or 0, its meaning is a little inexact. For example, if the correlation coefficient for two variables is + 0.8, this would tell us that the variables are positively correlated, but the correlation is not perfect. It does not really tell us much else. A more meaningful analysis is available from the square of the correlation coefficient (r2) which is called ‘coefficient of determination’.

What ‘r’ measures is the proportion of the total variation in the value of ‘y’ that can be explained by variations in the value of ‘x’. In the above illustration, r = – 0.992; therefore r2 = 0.984. This means that over 98% of variations in sales can be explained by variations in the passage of time, leaving 0.016 (less than 2%) of variations to be explained by other factors.

Similarly, if the coefficient for output volume and maintenance cost is 0.9, r2 would be 0.81, meaning that 81% of variations in maintenance costs could be predicted by variations in output volume, leaving only 19% of variations to be explained by other causes (for example age of equipment).

In the two illustrations above, it would be reasonable to conclude that there is a high degree of correlation between ‘x’ and ‘y’ and that predicted values of ‘x’ could therefore be used to estimate expected values for ‘y’ with reasonable confidence in the accuracy of the predictions. (If r= 1, r2 would also be 1, meaning that when there is perfect correlation, variations in ‘y’ can be predicted entirely by variations in ‘x’).

Note, however, that if r2 = 0.81, we would say that 81% of the variations in ‘y’ can be predicted by variations in ‘x’. We do not necessarily conclude that 81 % of variations in ‘y’ are necessarily caused by the factor ‘x’.

We must beware of reading too much importance into our statistical analysis. For example, if a company’s sales volume is increasing over time, and V for time and sales volume is, say 0.95, we could predict that sales will vary over time, but it would be silly to say that time causes the variation in sales.

, , , , ,

shopify traffic stats