Correlation Analysis

Introduction to Correlation, Regression v/s Correlation

**INTRODUCTION TO CORRELATION**

Regression v/s Correlation

Regression v/s Correlation

The measure of central tendency and measure of dispersion, these statistical measures dealt with only one variable at a time (univariate data). For example, we may find the mean height of the students of a class or the standard deviation among them. In both cases, a single variable, height, was involved. However, many times it is required to deal with more than one variable simultaneously. For example, we may wish to find the relationship between the age of a child and his/her height. In such cases, two other statistical tools, namely correlation and regression are studied.

In this lesson, we will study in detail correlation analysis and regression analysis.

**CORRELATION ANALYSIS**

**Types of Data (on basis of the number of variables)**

- Univariate Data: One variable is there
- Bivariate Data: Two variables involved
- Multivariate Data: Multiple variables involved

Carefully observe your surroundings. You will notice that there are many such pairs of variables where one variable is related to the other. Take, for example, the amount of rainfall and crop yield. The crop yield is directly related to the amount of rainfall. A similar relationship can be found in many variables such as the price of a commodity and its supply; the number of vehicles and pollution level and so on. The relationship between two variables is studied with the help of a statistical tool called correlation. It studies the degree and intensity of the relationship between the two variables.

Definition of Correlation given by different mathematicians are as follows:

As per Croxton and Cowden, "When the relationship is of qualitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in a brief formula is known as correlation."

As per Boddington. " Whenever some definite connection exists between the two or more groups, classes or series or data there is said to be a correlation."

As per A.M. Turtle, " An analysis of the relationship of two or more variables is usually called correlation."

As per Connor, " Correlation analysis is the statistical tool that can be used to determine the degree to which one variable is related to the other."

The study of correlation finds importance in understanding various practical life problems.

i.

ii.

iii.

iv.

v.

We know that statistical measures such as central tendency, dispersion, etc. relate to only one variable. Such distributions that relate to only one variable are known as univariate distributions. On the other hand, other statistical measures namely, correlation and regression deal with two variables simultaneously.

Such data that relates to two variables is known as bivariate data and the corresponding distributions are known as bivariate frequency distributions or two-way frequency distributions.

To understand the bivariate distribution, consider the example given below.

From a bivariate distribution, two distributions can be derived. They are as follows:

i. Marginal distribution

ii. Conditional distribution

Correlation analysis helps in identifying…

**Meaning of Correlation**Carefully observe your surroundings. You will notice that there are many such pairs of variables where one variable is related to the other. Take, for example, the amount of rainfall and crop yield. The crop yield is directly related to the amount of rainfall. A similar relationship can be found in many variables such as the price of a commodity and its supply; the number of vehicles and pollution level and so on. The relationship between two variables is studied with the help of a statistical tool called correlation. It studies the degree and intensity of the relationship between the two variables.

Definition of Correlation given by different mathematicians are as follows:

As per Croxton and Cowden, "When the relationship is of qualitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in a brief formula is known as correlation."

As per Boddington. " Whenever some definite connection exists between the two or more groups, classes or series or data there is said to be a correlation."

As per A.M. Turtle, " An analysis of the relationship of two or more variables is usually called correlation."

As per Connor, " Correlation analysis is the statistical tool that can be used to determine the degree to which one variable is related to the other."

**Significance of Correlation**The study of correlation finds importance in understanding various practical life problems.

i.

**Formation of laws**: In economics, the study of correlation analysis forms the basis for various theories and laws such as the law of demand and that of supply, concept of elasticity, etc. For example, the law of demand is based on the relationship between the price of a commodity and its quantity demanded.ii.

**Degree and direction**: Correlation helps in measuring the degree and direction of the relationship between two variables. For example, besides establishing the relationship between the demand of a commodity and its price, it would also help in estimating the extent to which the two are related and in which direction.iii.

**Base for regression analysis**: Correlation serves as the base for regression analysis. Once it is established that the two variables are correlated, the value of one variable given the value of another variable can be depicted using the regression analysis.iv.

**Business decisions and planning**: Correlation analysis proves helpful in taking important business-related decisions. For example, by looking at the trend on how an increase in production has to lead to an increase in profitability, future plans regarding production can be easily made.v.

**Helps in policy formation by the government**: Similar to business, correlation also helps the government in framing plans and policies. For example, policies regarding poverty alleviation can be framed on the basis of a correlation between expenditure on poverty alleviation programs and percentage poverty reduction.**Bivariate Data**We know that statistical measures such as central tendency, dispersion, etc. relate to only one variable. Such distributions that relate to only one variable are known as univariate distributions. On the other hand, other statistical measures namely, correlation and regression deal with two variables simultaneously.

Such data that relates to two variables is known as bivariate data and the corresponding distributions are known as bivariate frequency distributions or two-way frequency distributions.

To understand the bivariate distribution, consider the example given below.

Variable Y | Variable X | |||||

5 - 10 | 10 - 15 | 15 - 20 | 20 - 25 | 25 - 30 | Total | |

5 - 10 | I | I | II | I | I | 6 |

10 - 15 | I | IIII | IIIII | IIIII I | I | 17 |

15 - 20 | I | II | IIII | II | IIIII | 14 |

20 - 25 | I | III | IIIII | III | 12 | |

25 - 30 | I | 1 | ||||

Total | 3 | 8 | 15 | 14 | 10 | 50 |

**Marginal Distribution and Conditional Distribution**From a bivariate distribution, two distributions can be derived. They are as follows:

i. Marginal distribution

ii. Conditional distribution

**Marginal Distribution**: Marginal distribution is the frequency distribution of each of the variables individually along with the frequency totals/marginal totals.**Conditional Distribution**:**Under conditional distribution, the frequency values of one variable are obtained when the values of the other variable are given.****REGRESSION ANALYSIS**

Correlation analysis helps in identifying…

To view the complete topic, please