Chapter 4: Correlation and Regression (CAIIB – Paper 1)

1. What is the primary purpose of a scatter diagram in statistics?

  • A. To calculate mean and standard deviation
  • B. To determine the median of a dataset
  • C. To visually show the relationship between two variables
  • D. To display data in a tabular form
Scatter diagrams plot paired data points on a graph to visualize the relationship (positive, negative, or none) between two variables.

2. In a scatter diagram, if the points form a pattern sloping upwards from left to right, it indicates:

  • A. Positive correlation
  • B. Negative correlation
  • C. No correlation
  • D. Perfect regression
An upward sloping pattern in a scatter diagram indicates that as one variable increases, the other variable also tends to increase, showing positive correlation.

3. Which of the following statements about scatter diagrams is correct?

  • A. They only show categorical data
  • B. They are used to calculate variance
  • C. They can predict exact numerical values without regression
  • D. They help to identify the type and strength of relationship between two variables
Scatter diagrams are primarily used to observe the relationship between two quantitative variables and to detect patterns, trends, or correlation type.

4. When points in a scatter diagram are widely scattered with no visible pattern, it suggests:

  • A. Perfect positive correlation
  • B. No correlation
  • C. Negative correlation
  • D. Linear regression
When the points do not form any identifiable pattern, it indicates that there is no significant correlation between the two variables.

5. Which type of relationship can a scatter diagram NOT directly quantify?

  • A. Positive linear relationship
  • B. Negative linear relationship
  • C. Exact numerical strength of correlation
  • D. Non-linear trends
A scatter diagram visually shows patterns or trends but does not calculate the exact numerical value of correlation; that requires statistical formulas like Pearson’s r.

6. The correlation coefficient (r) measures:

  • A. Causation between variables
  • B. Mean deviation of variables
  • C. Regression slope only
  • D. The strength and direction of a linear relationship between two variables
The correlation coefficient (r) ranges from -1 to +1 and measures both the strength and direction (positive or negative) of a linear relationship between two variables.

7. If the correlation coefficient between two variables is -0.85, it indicates:

  • A. Weak positive correlation
  • B. Strong negative correlation
  • C. No correlation
  • D. Perfect positive correlation
A negative correlation close to -1 indicates a strong inverse relationship; as one variable increases, the other tends to decrease.

8. Which of the following is TRUE about correlation?

  • A. Correlation implies causation
  • B. Correlation can only be positive
  • C. Correlation measures the degree of association but not causality
  • D. Correlation coefficient can exceed 1
Correlation quantifies the degree of association between variables but does not indicate causation. The correlation coefficient is always between -1 and +1.

9. Which type of correlation exists when one variable increases and the other decreases?

  • A. Negative correlation
  • B. Positive correlation
  • C. Zero correlation
  • D. Partial correlation
Negative correlation occurs when an increase in one variable corresponds to a decrease in the other variable.

10. A scatter diagram showing points clustered around a straight line suggests:

  • A. No correlation
  • B. Strong linear correlation
  • C. Weak non-linear correlation
  • D. Random distribution
When points in a scatter diagram closely follow a straight line, it indicates a strong linear relationship between the variables.

11. What is the main purpose of regression analysis?

  • A. To visualize data using a scatter diagram
  • B. To calculate standard deviation of a dataset
  • C. To predict the value of a dependent variable based on an independent variable
  • D. To find the median of a dataset
Regression analysis is used to establish a relationship between a dependent variable and one or more independent variables to predict future values.

12. In simple linear regression, the regression line of Y on X is represented by:

  • A. Y = a + bX + c
  • B. Y = a + bX
  • C. X = a + bY
  • D. Y = bX
In simple linear regression, Y is expressed as a linear function of X: Y = a + bX, where 'a' is the intercept and 'b' is the slope of the regression line.

13. The slope (b) of the regression line indicates:

  • A. Average value of Y
  • B. Correlation coefficient
  • C. Intercept with X-axis
  • D. Change in Y for a unit change in X
The slope 'b' measures how much the dependent variable Y changes when the independent variable X increases by one unit.

14. If the regression line of X on Y has a slope of 0, it indicates:

  • A. Perfect positive correlation
  • B. No linear relationship between X and Y
  • C. Perfect negative correlation
  • D. Partial correlation exists
A slope of zero means changes in Y do not affect X; hence, there is no linear relationship between the variables.

15. In regression analysis, the difference between the observed and predicted values of Y is called:

  • A. Residual
  • B. Correlation coefficient
  • C. Regression coefficient
  • D. Mean deviation
The residual is the error term in regression, representing the difference between the actual observed value and the value predicted by the regression line.

16. Which of the following is TRUE about regression lines?

  • A. The regression line always passes through the origin
  • B. There is only one regression line for X on Y
  • C. There are two regression lines: Y on X and X on Y
  • D. Regression lines cannot be used for prediction
In bivariate data, two regression lines can be calculated: one predicting Y from X (Y on X) and another predicting X from Y (X on Y).

17. The regression coefficient of Y on X is related to correlation (r) as:

  • A. b = r^2 × (σX / σY)
  • B. b = r × (σY / σX)
  • C. b = r × (σX × σY)
  • D. b = σY / r
The slope of the regression line Y on X is calculated as b = r × (σY / σX), where r is the correlation coefficient and σ are the standard deviations.

18. In perfect positive correlation, the regression line of Y on X:

  • A. Passes through all points and has positive slope
  • B. Passes through origin only
  • C. Is horizontal
  • D. Cannot be determined
Perfect positive correlation (r = +1) means all data points lie exactly on a straight line with a positive slope.

19. Which method is commonly used to estimate the regression line?

  • A. Pearson correlation method
  • B. Scatter diagram inspection
  • C. Residual plotting only
  • D. Method of least squares
The method of least squares minimizes the sum of squared residuals to find the best-fitting regression line.

20. In regression, the intercept (a) represents:

  • A. The slope of the line
  • B. The predicted value of Y when X = 0
  • C. The correlation coefficient
  • D. Residual error
The intercept 'a' is the point where the regression line crosses the Y-axis, representing the predicted value of Y when X equals zero.

21. What does the standard error of estimate measure in regression analysis?

  • A. The correlation between X and Y
  • B. The slope of the regression line
  • C. The average deviation of observed values from the predicted values
  • D. The intercept of the regression line
The standard error of estimate quantifies the dispersion of observed values around the regression line; it indicates how accurately the regression line predicts the dependent variable.

22. A smaller standard error of estimate indicates:

  • A. Greater variability of data around the regression line
  • B. Better predictive accuracy of the regression line
  • C. No relationship between variables
  • D. Negative correlation
A smaller standard error means that the observed data points are closer to the regression line, indicating more reliable predictions.

23. The formula for the standard error of estimate (Se) is:

  • A. Se = √Σ(Yi - Ȳ)² / n
  • B. Se = Σ(Yi - Ŷi) / n
  • C. Se = Σ(Xi - X̄)² / n
  • D. Se = √Σ(Yi - Ŷi)² / (n-2)
Standard error of estimate is calculated as the square root of the sum of squared differences between observed values (Yi) and predicted values (Ŷi), divided by (n-2) in simple linear regression.

24. In regression analysis, the standard error of estimate is zero when:

  • A. All observed values lie exactly on the regression line
  • B. Correlation coefficient is zero
  • C. Slope of regression line is zero
  • D. Intercept is zero
A standard error of estimate of zero occurs when all observed values perfectly match the predicted values on the regression line, indicating perfect prediction.

25. The standard error of estimate is closely related to which statistical measure?

  • A. Mean of X
  • B. Correlation coefficient between X and Y
  • C. Median of Y
  • D. Variance of X only
The standard error of estimate decreases as the correlation between X and Y increases, indicating that stronger relationships lead to more precise predictions.

26. A regression line of Y on X is given by Ŷ = 50 + 2X. If X = 10, the predicted value of Y is:

  • A. 50
  • B. 20
  • C. 70
  • D. 60
Substitute X = 10 in Ŷ = 50 + 2X → Ŷ = 50 + 2*10 = 70.

27. In a dataset, Σ(Yi - Ŷi)² = 200 and n = 12. The standard error of estimate is:

  • A. √200 / 12 = 4.08
  • B. √(200 / 10) = 4.47
  • C. 200 / 10 = 20
  • D. 200 / 12 = 16.67
Standard error Se = √Σ(Yi - Ŷi)² / (n-2) = √(200 / 10) ≈ 4.47.

28. If the correlation coefficient r = 0.9, the standard error of estimate will be:

  • A. Small, indicating high predictive accuracy
  • B. Large, indicating poor prediction
  • C. Zero, always
  • D. Negative
A high correlation (r close to 1) means observed values lie close to the regression line, resulting in a small standard error of estimate.

29. Caselet: A bank manager wants to predict monthly sales (Y) based on advertising expense (X). The regression line obtained is Ŷ = 100 + 5X. If X = 20, what is the predicted sales?

  • A. 100
  • B. 105
  • C. 110
  • D. 200
Predicted sales Ŷ = 100 + 5*20 = 100 + 100 = 200.

30. In a regression analysis, if all residuals (Yi - Ŷi) are zero, then:

  • A. Regression slope is zero
  • B. Standard error of estimate is zero, indicating perfect fit
  • C. Correlation coefficient is zero
  • D. Intercept is negative
If all observed values match predicted values perfectly, residuals are zero and standard error of estimate = 0, indicating a perfect regression fit.

31. Caselet: A bank wants to estimate loan defaults (Y) based on credit score (X). Regression line: Ŷ = 50 - 0.3X. If credit score X = 120, estimated defaults are:

  • A. 50
  • B. 36
  • C. 14
  • D. -10
Predicted Y = 50 - 0.3*120 = 50 - 36 = 14.

32. The residuals in regression analysis are useful for:

  • A. Calculating slope only
  • B. Calculating correlation
  • C. Finding intercept only
  • D. Checking model fit and identifying outliers
Residuals show the difference between observed and predicted values, helping to check model fit and detect outliers.

33. Caselet: In a regression study, Σ(Yi - Ŷi)² = 150, n = 8. The standard error of estimate is:

  • A. √(150 / 8) = 4.33
  • B. √(150 / 6) = 5.0
  • C. 150 / 6 = 25
  • D. 150 / 8 = 18.75
Standard error Se = √Σ(Yi - Ŷi)² / (n-2) = √(150 / 6) ≈ 5.0.

34. A regression line Y on X has correlation r = 0. What is the standard error of estimate?

  • A. Zero
  • B. Negative
  • C. Maximum possible for given Y variance
  • D. Equal to slope
If correlation is zero, the regression line cannot explain Y values; residuals equal Y variance, giving maximum standard error.

35. In regression analysis, decreasing the spread of data points around the regression line will:

  • A. Increase standard error of estimate
  • B. Decrease standard error of estimate
  • C. Not affect standard error
  • D. Invert regression slope
Smaller spread around the regression line means residuals are smaller, leading to a lower standard error and more accurate predictions.

Post a Comment