Chapter 1: Definition of Statistics, Importance & Limitations & Data Collection, Classification & Tabulation (CAIIB – Paper 1)

CAIIB ABM | Module A | Ch.1 — Statistics: Definition, Importance & Data Collection
20
Questions
0
Attempted
0
Correct
0
Wrong
Score: 0 / 20
Definitions of Statistics
🔴 High Frequency
1
MCQ Easy Repeated every exam
The word "Statistics" in the sentence — "The Statistics of NPAs published by RBI show an improving trend" — is used in its:
💡 Explanation
Statistics has two meanings:
Plural (Data): Refers to numerical facts/figures themselves — e.g., "NPA statistics," "deposit statistics." This is the meaning in the question.
Singular (Science): The discipline of collecting, organising, analysing and interpreting data — e.g., "Statistics is a branch of applied mathematics."
💡 Memory Tip: "The statistics SHOW" → Plural (data). "Statistics IS a science" → Singular (method).
2
MCQ Easy Definition — Croxton & Cowden
The most comprehensive and widely accepted definition of Statistics is given by:
💡 Explanation
Croxton & Cowden's definition is considered the most complete — it covers the entire statistical process: Collection → Presentation → Analysis → Interpretation.

Bowley's definition ("science of counting / science of averages") is incomplete — only covers one aspect.
Secrist's definition focuses on the characteristics of statistical data, not the process.
💡 CAIIB Tip: If asked "which definition is incomplete?" — answer is Bowley. If asked "which is most comprehensive?" — answer is Croxton & Cowden.
3
MCQ Medium Secrist — Characteristics
According to Horace Secrist, which of the following is NOT a characteristic of statistical data?
💡 Explanation
Secrist's 5 key characteristics of statistical data:
1. Aggregate of facts — a single fact is NOT statistics. Multiple facts = statistics.
2. Numerically expressed — must be quantifiable.
3. Collected systematically — haphazard collection is unreliable.
4. Predetermined purpose — collected with a specific objective.
5. Placed in relation to each other — data must be comparable.
💡 Option C is WRONG as a characteristic because a single fact does NOT constitute statistics — it must be an aggregate.
Functions & Importance of Statistics
🔴 High Frequency
4
MCQ Easy Function — Condensation
A bank's annual report presents millions of transactions as just three figures — Total Deposits, Total Advances, and NPA ratio. This illustrates which function of Statistics?
💡 Explanation
Condensation (Simplification) is one of the primary functions of Statistics — it reduces large volumes of complex data into a few representative figures (averages, ratios, percentages, graphs) that are easy to understand and communicate.

Key functions of Statistics to remember:
Condensation — reducing mass data
Comparison — across branches, time periods
Forecasting — predicting future trends
Policy formulation — data-driven decisions
Hypothesis testing — validating assumptions
Establishing relationships — correlation, regression
5
MCQ Medium NOT a function — Trick Q
Which of the following is NOT a function of Statistics?
💡 Explanation
Options A, B, and C are genuine functions of Statistics.

Option D is actually a Limitation presented as a false function — Statistics cannot provide certainty at the individual level. It is probabilistic, not deterministic.
⚠️ Exam Trap: In "NOT a function" questions, options with words like "guarantees," "absolute certainty," "every individual" are always limitations, not functions.
Limitations / Demerits of Statistics
🔴 High Frequency
6
MCQ Easy Limitation — Aggregates only
A credit analyst says: "The average NPA of our portfolio is 4%, therefore no individual account will default." This statement violates which limitation of Statistics?
💡 Explanation
The limitation violated is: "Statistical laws hold true on average, not for every individual."

A 4% average NPA means 4% of the portfolio defaults on average — it says nothing about which specific account will default. Any individual account could have a 0% or 100% probability of default. This is why individual credit appraisal cannot be replaced by portfolio statistics alone.
7
MCQ Medium Limitation — Misuse / Disraeli
"There are three kinds of lies — lies, damned lies, and statistics." This quote (attributed to Disraeli) highlights which limitation of Statistics?
💡 Explanation
This reflects the limitation that Statistics can be misused. By selectively choosing what data to show, one can make statistics support almost any argument.

Banking Example: A bank may highlight loan growth (+20%) while hiding NPA growth (+50%). An advertiser uses "9 out of 10 dentists recommend..." without revealing the sample size was only 10.
💡 This is a regularly asked conceptual question in CAIIB. The Disraeli quote = Misuse of Statistics.
8
MCQ Medium Limitation — Qualitative data
Customer satisfaction, employee morale and brand reputation CANNOT be directly studied by Statistics because of which limitation?
💡 Explanation
Statistics deals with numerical (quantitative) data. Qualitative characteristics like satisfaction, morale, or reputation cannot be directly measured statistically.

To use statistics, qualitative data must first be converted to numbers — e.g., a 1–10 satisfaction scale, a 5-point Likert scale, or an NPS (Net Promoter Score). Only then can statistical tools be applied.
💡 Key point: Statistics doesn't mean qualitative data is ignored — it just means it must be quantified first before statistical analysis.
9
MCQ Hard Correlation ≠ Causation
A bank finds a high statistical correlation between metro branch location and high deposit levels, and concludes that "opening branches in metros CAUSES deposit growth." This is an example of:
💡 Explanation
Correlation ≠ Causation — this is one of the most important and frequently tested principles in statistics.

Two variables may move together (correlate) without one causing the other. In this case, high income levels, existing financial activity, and infrastructure in metros — not just branch location — are the real drivers of deposit growth.
⚠️ Classic CAIIB trap: "High correlation PROVES that..." is always wrong. Correlation only shows association — domain expertise is needed to establish causation.
Collection of Data — Primary vs Secondary
🔴 High Frequency
10
MCQ Easy Primary vs Secondary — Identification
Which of the following is an example of PRIMARY data collection?
💡 Explanation
Primary Data: Collected directly by the investigator for the first time for a specific purpose.
Secondary Data: Already collected by someone else — used second-hand.

Options A (RBI), B (CIBIL), and D (NABARD) are all secondary data — collected by external agencies and reused by the bank.
Option C — the bank is collecting data itself for the first time → Primary Data.
💡 Quick Rule: Did YOU collect it for the FIRST TIME? → Primary. Did someone ELSE collect it and you're using it? → Secondary.
11
MCQ Medium Primary vs Secondary — Comparison
Which of the following statements about Primary and Secondary data is CORRECT?
💡 Explanation
Key differences between Primary and Secondary data:

Primary Data: Collected fresh, expensive, time-consuming, highly relevant to the specific study objective.
Secondary Data: Already available, cheap and quick to access, but may not perfectly match the study's needs and could be outdated.

Option A is wrong — secondary data from a poorly conducted source can be less accurate than good primary research.
Option B is wrong — primary data collection is actually more expensive and time-consuming.
💡 Best practice: Use secondary data first (cheap, quick). Collect primary data only to fill specific gaps not covered by secondary sources.
12
MCQ Medium Reliable Secondary Source
For analysing sectoral credit deployment trends in India, the MOST reliable secondary data source for a bank would be:
💡 Explanation
Secondary data must be evaluated for Reliability, Suitability, Adequacy, and Accuracy.

Most reliable secondary sources for banking:
• RBI publications (Annual Report, Trend & Progress of Banking, Handbook of Statistics)
• NABARD (for agriculture finance)
• SEBI, SIDBI, NHB (for their respective sectors)
• National Statistical Office (NSO)

Option D (internal MIS) is reliable for the bank's OWN data, but cannot provide INDUSTRY-WIDE trends.
Classification of Data
🟡 Medium Frequency
13
MCQ Easy Classification Types — Identify
Branch-wise deposit data showing deposits of each of 5,000 branches across different states of India is an example of which type of classification?
💡 Explanation
Four types of classification:

Chronological — basis is Time (e.g., quarterly NPA data FY21–FY25)
Geographical — basis is Place/Location (e.g., state-wise deposits, branch-wise data) ✓
Qualitative — basis is an Attribute/non-numerical quality (e.g., loans by sector: Agri, MSME, Retail)
Quantitative — basis is a measurable numeric value (e.g., loans by amount: <1L, 1–5L, >5L)
💡 Quick Rule: Time? → Chronological. Place? → Geographical. Category/Type? → Qualitative. Number/Amount? → Quantitative.
14
MCQ Medium Classification — Applied scenario
RBI classifies bank loans into Agriculture, MSME, Retail, and Corporate sectors. This type of data classification is:
💡 Explanation
Classifying loans into sectors (Agriculture, MSME, Retail, Corporate) uses a non-numerical attribute — the sector type. This is Qualitative Classification.

It would be Quantitative only if classified by a measurable value — e.g., loan amount ranges (₹0–1L, ₹1L–5L). The classification basis is what determines the type, not the data itself.
Tabulation & Frequency Distribution
🔴 High Frequency
15
MCQ Easy Exclusive vs Inclusive — Very common
A bank classifies loan accounts using exclusive class intervals: 0–50,000 | 50,000–1,00,000 | 1,00,000–2,00,000. A loan account with outstanding balance of EXACTLY ₹50,000 will be placed in which class?
💡 Explanation
In Exclusive Class Intervals: the lower limit is included, the upper limit is excluded [lower, upper).
Class 0–50,000 includes values: ₹0 to ₹49,999.99
Exactly ₹50,000 goes into the NEXT class → 50,000–1,00,000.

In Inclusive Class Intervals (e.g., 0–49,999 | 50,000–99,999): both limits are included. ₹50,000 would go in 50,000–99,999.
💡 Exclusive intervals are used for continuous data (loan amounts, interest rates). Inclusive intervals for discrete data (number of accounts).
16
Numerical Medium Relative Frequency — Calculation
A bank has 500 retail loan accounts. The class "₹5L–₹10L outstanding" has 125 accounts. The relative frequency of this class is:
💡 Explanation
Relative Frequency = (Class Frequency / Total Frequency) × 100

= (125 / 500) × 100 = 25%

Relative frequency expresses each class's proportion as a percentage of the total. The sum of all relative frequencies = 100%.
In decimal form: 125/500 = 0.25 (not 0.025 as in Option D).
💡 Relative frequency is more useful than absolute frequency for comparing two portfolios of different sizes.
17
Numerical Medium Cumulative Frequency — Very common
The frequency distribution of overdue days for 200 loan accounts is: 0–30: 100 | 31–60: 50 | 61–90: 30 | 91–120: 15 | Above 120: 5. What is the cumulative frequency for "less than 90 days overdue"?
💡 Explanation
"Less than 90 days" includes all three classes: 0–30, 31–60, and 61–90.

CF (less than 90) = 100 + 50 + 30 = 180 accounts

This means 180 out of 200 accounts (90%) have overdue of less than 90 days — i.e., they are NOT yet NPA under RBI's 90-day rule.
The remaining 20 accounts (15 + 5 = 20) are NPA (overdue > 90 days) = 10% NPA rate.
💡 Cumulative frequency is built by adding frequencies progressively from the lowest class upward. A very practical tool for NPA portfolio analysis.
18
MCQ Medium Tabulation — Parts of Table
In a statistical table, the column headings that describe the data presented in each column are called:
💡 Explanation
Parts of a statistical table:

Title — heading describing subject, place and period
Caption — column headings (top of each column) ✓
Stub — row headings (left side of each row)
Body — the actual numerical data in the cells
Head note — clarification below the title (e.g., "Figures in ₹ crore")
Footnote — specific explanations for certain cells
Source note — origin of data (e.g., "Source: RBI 2025")
19
Numerical Hard Inclusive to Exclusive Conversion
A frequency table uses inclusive class intervals: 10–19, 20–29, 30–39. When converted to exclusive class intervals, the class 20–29 becomes:
💡 Explanation
Conversion Rule: Gap between consecutive inclusive classes = 20 − 19 = 1. Correction factor = gap/2 = 0.5.

New Lower Limit = Old Lower − 0.5  |  New Upper Limit = Old Upper + 0.5

So: 20–29 → 19.5–29.5
Similarly: 10–19 → 9.5–19.5  |  30–39 → 29.5–39.5

This makes intervals continuous — the upper limit of one class equals the lower limit of the next. Required for accurate histogram construction and median/mode calculation.
20
MCQ Hard Ogive — Median Identification
In a "Less than" Ogive (cumulative frequency curve), the Median of the distribution is located by:
💡 Explanation
Method to find Median from a Less-than Ogive:
1. On the Y-axis, locate N/2 (half the total frequency)
2. Draw a horizontal line from N/2 to meet the Ogive curve
3. From that intersection point, drop a vertical line to the X-axis
4. The X-value where the vertical meets the X-axis = Median

Alternative method: Plot both Less-than and More-than Ogives — their intersection point's X-coordinate also gives the Median.
💡 Formula method: Median = L + [(N/2 − CF) / f] × h, where L = lower limit of median class, CF = cumulative frequency before median class, f = frequency of median class, h = class width.
0%
Your Score

Post a Comment