Unit 6.2 - Collection of Data
Introduction to Data Collection
Data collection is the component of research
in all fields of study or process of gathering and measuring information on
targeted quantitative and qualitative data or variables in an established
system or instruments (existing, modified or newly developed).
While methods vary by discipline, the emphasis on
ensuring accurate and honest collection remains the same. Formal and accurate
data collection process is essential to maintain the integrity of research
(Quality assurance-action before data collection & Quality control-action
during and after data collection) by reducing the likelihood of errors.
Data Collection Steps
- Determine what information you want to collect
- Set a time-frame for data collection
- Determine your data collection method
- Collect the data
- Analyze the data and implement your findings
Data Collection Problem (necessitates prompt action)
- Systematic errors
- Violation of protocol
- Fraud or scientific misconduct
- Errors in individual data items
- Individual staff or site performance problems
Pre-requisites or Preliminaries for Data Collection
- Objectives and Scope of the enquiry: Objective highlights the nature of statistics to be collected & statistical techniques to be employed. Scope relates to the coverage with respect to the type of information, subject matter and geographical location.
- Statistical units to be used:
- Physical Units – kg. or pound, km. or miles, ropani or bigha etc.
- Arbitrary units – person, family, location etc.
- Sources of information: Primary or secondary or both
- Method of data collection – If primary data is used then we can collect data using census or sample method.
- Degree of accuracy: 95 % confidence level or 5% error
- Type of enquiry: (a) official, semi-official or un-official (b) Initial or initiative (c) Direct or indirect (d) Regular or ad-hoc (e) Census or sample (f) Primary or secondary Pre-requisites or Preliminaries for Data Collection
Data Collection Methods
There are many ways of classifying data. A
common classification is based upon who collected the data or approaches to information gathering or sources, data can be categorized as:
·
Primary data
·
Secondary data.
Primary Data
Primary data is new information
obtained directly (control and supervision) from the first hand source, using
methods like surveys, interviews or experiments. Primary data is also typically
first party data or raw data and original in nature.
Advantages of Using Primary Data
- High level of control over data collection for design, method, and data analysis techniques to be used.
- Collection of data specific to the problem (resolve specific research issues)
- Better accuracy or quality
- Up to date real-time data
- Exhibit ownership of the data
- Additional data obtaining possibility during the study period.
Disadvantages of Using Primary Data
- Expensive
- More time consuming
- Not feasible to collect due to complexity and commitment
Secondary Data
Secondary data is
public/existing or second-hand information collected and recorded by someone
else for some other purpose (but being utilized by the investigator for another
purpose). It is typically free or inexpensive to obtain and easily accessible.
It is the readily available form of data collected from various sources like
censuses, government publications, internal records of the organization,
reports, books, journal articles, websites and so on.Advantages of Using Secondary Data
- The data’s already there- no hassles of data collection
- Less
expensive
- Less
time consuming
- The
investigator is not personally responsible for the quality of data
Disadvantages of Using Secondary Data
- The investigator cannot decide what is collected (if specific
data about something is required, for instance).
- One
can only hope that the data is of good quality
- Obtaining additional data (or even clarification) about something is not possible (most often)
- May outdated data
Comparison Between Primary and Secondary Data
Similarities Between Primary and Secondary Data
- Both are statistical data.
- Helpful in the statistical investigation
- Both can be qualitative and qualitative
Differences Between Primary and Secondary Data
S .No.
|
Basis
|
Primary Data
|
Secondary Data
|
1
|
Meaning
|
First hand data generated by researcher himself
|
Second hand data collected by someone else earlier
|
2
|
Originality
|
Original
|
Not original
|
3
|
Data Types
|
Real time (Time sensitive)
|
Stale (no longer new or fresh)
|
4
|
Capability of Problem Solving
|
More (Problem specific)
|
Less (Not problem specific)
|
5
|
Process
|
Very Involved
|
Rapid and easy
|
6
|
Sources (Tools)
|
Surveys, observation, experiments, questionnaire,
interview
|
Govt. publications, websites, books, journals,
internal records
|
7
|
Time, Cost & Manpower
|
More (Expensive)
|
Less (Economical)
|
8
|
Control
|
Yes (Direct supervision)
|
Lesser
|
9
|
Nature of Data Availability
|
Crude form (Raw data)
|
Refined (finished) form of primary data
|
10
|
Accuracy & Reliability
|
More
|
Relatively less
|
10
|
Precaution & Editing
|
Not required
|
Required
|
11
|
Proprietary Information
|
Ownership & data remain hidden from the
competitors
|
No ownership & competitors have access to the
data
|
12
|
Personal Prejudice
|
Possibility
|
Less possibility
|
13
|
Relevancy
|
Relevant to the user's need
|
May not be relevant to the user's need
|
14
|
Advantage
|
Authentic, specific, up to date
|
Very cheap and not time-consuming
|
15
|
Disadvantage
|
Costly & Time Consuming
|
May outdated or irrelevant
|
Methods of Collecting Primary Data
The choice of method is influenced by the data collection
strategy, the type of variable, the accuracy required, the collection point and
the skill of the enumerator. The main traditional data collection methods are:
1. Direct Personal Interviews
Conversation with a purpose
Direct Personal Interview |
Forms which are completed through an interview (oral
questionnaire) with the respondent in a face-to-face, via telephone or in
online relationship.
In interviews, information is obtained through inquiry
(oral or verbal response) and recorded by enumerators. Structured interviews
are performed by using survey forms, whereas open interviews are notes taken
while talking with respondents. More expensive than questionnaires, but
interviews are better for more complex questions, low literacy or less co-operation.
Merits
- Interview is relatively more flexible tool than any written inquiry form and permits explanation, adjustment and variation according to the situation.
- A researcher or an interviewer can interact with his respondents and know their inner feelings and reactions.
- In-depth information
- Accurate information
Demerits
- Expensive method
- Possibility of bias interviewer or respondent
- More time consuming
- The interviewers must be well-trained in the necessary soft skills and the relevant subject matter.
2. Indirect Oral Investigations
When
direct personal interview is not suitable and in case of sensitive information
relates to wealth, corruption, prostitution, illegal activities, this method is
applied. Mediator or witness provide the information or data. Police
department, investigation department mostly use this method.
Merits
- Less time consuming, less expensive
- Wide area coverage
Demerits
- Cross examination is necessary if witness biased
- Personal biasness
3. Questionnaire
It
consists of a number of questions printed or typed in a definite order on a
form. Forms can be handed out or sent by mail (post office, email) or fax and
later completed and returned by respondents. An inexpensive method that is
useful where literacy rates are high and respondents are co-operative.
Essential of a Good Questionnaire
- It should be short and simple
- Questions should proceed in a logical sequence
- Technical terms and vague expressions must be avoided
- Control questions to check the reliability of the respondent must be present
- Adequate space for answers must be provided
- Brief directions with regard to filling up of questionnaire must be provided
- The physical appearances – quality of paper, color must be good to attract the attention of the respondent
Merits
- Economical (money, time & manpower)
- Cover wide area: large amounts of information can be collected from a large number of people in a short period of time
- Free from bias of interviewer
- Respondents have adequate time to give answers
Demerits
- Problem relating to question construction
- Low response rate (low rate of return of duly filled questionnaire)
- Possibility of ambiguous or omission of replies
- Difficulty in obtaining mailing address
- Inflexible ( control over question is lost once it is sent)
- It can be used only when the respondents are educated and cooperative.
4. Schedule
Schedule |
Merits
- Wide area coverage: large amounts of information can be collected from a large number of people
- Supplementary information
Demerits
- Very expensive
- More time consuming
- Bias of the enumerators cannot be ruled out
5. Information Received Through Local Agencies (Correspondents)
The investigator appoints local agents or correspondents in
different places to collect information under this method. These correspondents
(generally paid staff) collect and transmit the information to the central
office where data are processed. This method is generally adopted by newspaper
agencies, government departments to obtain information at regular intervals
from a wide area.
Merits
- Cheap
- Appropriate for extensive investigation
Demerits
- Not always ensure accurate results because of the personal prejudice and bias of the correspondents.
- Required skilled and experienced correspondents
Sources of Secondary Data
Secondary
data is often readily available and easily accessible.
Nowadays, after the expense on
electronic media and
internet the availability of
secondary data has become much easier. Major sources of secondary data are as
follows:
- Published Sources
- Unpublished
Sources of Secondary Data |
1. Published Sources
There are varieties
of published printed and electronic sources like books,
journals/periodicals, magazines and newspapers, e-journals from e-library,
websites, weblogs or blogs (personal written diaries) etc.
Their credibility depends on many factors.
For example, on the writer, publishing company and time
and date when published. New
sources are preferred
and old sources
should be avoided
as new technology
and researches bring new facts into light. A few major published
secondary data from various sources or agencies are mentioned below:
-
Government Statistics: Government (3
tier) statistics are widely
available and easily accessed, and can provide insights related to trade activity,
business formation, patents, pricing and economic trends, among other topics.
In Nepal, MOF, CBS, NRB, NPC and other ministerial publications are government
statistics
-
Semi-government Statistics: TU, NBL, NIDC, NTC, NEA
-
International
Publications: Data
published by international organizations is collected by researching
on a wider population. There are various international organizations such as
WB, IMF, WHO, WTO, UN, ILO, ADB etc.
-
Industry Associations & Company Websites: Some information may be accessible
to members only (such as member directories or market research), but these are
a great place to look when starting to learn about a new industry or when
looking for information of annual reports, regulatory findings. Reports &
publications of trade union, chamber of commerce, BFIs, stock exchange are also
the published sources of secondary data.
-
Research Institutions: Economists, research scholars, universities
and other educational & research institutions
-
Newspapers
and magazines: There
are many newspapers and magazines which are a useful, important, cheap,
accessible and reliable source of secondary data. Popular examples of
international magazines are The Economist, Money, Frontline, Bloomberg Business
Week, Entrepreneur, The New Yorkers, Forbes, The Wall Street Journal, and
Business World, etc.
Government Statistics: Government (3
tier) statistics are widely
available and easily accessed, and can provide insights related to trade activity,
business formation, patents, pricing and economic trends, among other topics.
In Nepal, MOF, CBS, NRB, NPC and other ministerial publications are government
statistics
Semi-government Statistics: TU, NBL, NIDC, NTC, NEA
International
Publications: Data
published by international organizations is collected by researching
on a wider population. There are various international organizations such as
WB, IMF, WHO, WTO, UN, ILO, ADB etc.
Industry Associations & Company Websites: Some information may be accessible
to members only (such as member directories or market research), but these are
a great place to look when starting to learn about a new industry or when
looking for information of annual reports, regulatory findings. Reports &
publications of trade union, chamber of commerce, BFIs, stock exchange are also
the published sources of secondary data.
Research Institutions: Economists, research scholars, universities
and other educational & research institutions
Newspapers
and magazines: There
are many newspapers and magazines which are a useful, important, cheap,
accessible and reliable source of secondary data. Popular examples of
international magazines are The Economist, Money, Frontline, Bloomberg Business
Week, Entrepreneur, The New Yorkers, Forbes, The Wall Street Journal, and
Business World, etc.
2. Unpublished Sources
Some
records maintained by private firms, business enterprises, scholars,, research
workers, may not release to outside agency or may not readily available and
easily accessible. Some of the major unpublished sources from which secondary
data can be gathered are:
- Diaries: Rarely available personal records
- Letters: Reliability should be checked before using them.
- Government Records: important for marketing, management, humanities and social science research.
- Public Sector Records: NGOs’ survey data, health records, police records
- Database or records of private company.
Desirable Qualities (Reliability) of Secondary Data
Because of the disadvantages
(error, inconsistency) of secondary data, we will lead to evaluation of quality
of secondary data. Evaluation means the following four requirements must be
satisfied:
- Availability- Secondary data should be easily available. If secondary data is not available as required, it is necessary to go for primary data.
- Reliability - The reliability of published statistics may vary over time. Reliability can be verified through some investigations like (a) Who collects data? (b) From what source? (c) Which methods (sample size, response rate, questionnaire design, modes of analysis)? (d) Possibility of bias of stratifying a sample (geographical, administrative boundaries)? (ed) Accuracy or error?
- Suitability – Suitability of data indicates the relevancy or applicability of data to the current problem. The data must be suitable for the (a) Research objective (b) Units of measurement (c) Concepts and currency (d) Current time (i.e. census data may out-of-date, data may be collected in abnormal year)
- Adequacy – Sufficient data must be available for the study. The data is considered inadequate (a) if the level of accuracy achieved in data is found inadequate (b) if they are related to an area (or scope) may be either narrower or wider than the area of the present enquiry.
Census and Sample
Census and sampling are methods of collecting survey data about the population that are used by many countries.
Census
The entire set of possible observations (complete statistical enumeration) in which the data are collected for
each and every element/unit of the population, it is termed as Census Method. Hence this method requires huge
finance, time and labour for gathering information.In
our country, the Government conducts the Census of Nepal every
10 years. The Census reveals demographic information such as
birth rates, death rates, total population, population growth rate of our
country, etc.
Merits and Demerits of Census Method
Sample
A subset/fraction of the population (statistical enumeration of subgroup) from
which information is actually collected and represents the entire group. This method is used, when population
size is very large. The units which constitute sample is considered as
‘Sampling Units’. The full-fledged list containing all sampling units is
called ‘Sampling Frame’. For e. g. Blood Test, Cook Testing by Chef.
Merits and Demerits of Sample Method
Difference between Census and Sampling Methods
S.
No.
|
BASIS FOR
COMPARISON
|
CENSUS
|
SAMPLING
|
1
|
Meaning
|
A systematic method
that collects and records the data about the members of the population is
called Census.
|
Sampling refers to a
portion of the population selected to represent the entire group, in all its
characteristics.
|
2
|
Enumeration
|
Complete (extensive)
|
Partial (limited)
|
3
|
Study of
|
Each and every unit
of the population.
|
Only a handful of
units of the population.
|
4
|
Time required
|
It is a time-consuming
process.
|
It is a fast process.
|
5
|
Cost
|
Expensive method
|
Economical method
|
6
|
Results
|
Reliable and accurate
|
Less reliable and
accurate, due to the margin of error in the data collected.
|
7
|
Error
|
Not present.
|
Depends on the size
of the population
|
8
|
Appropriate/relevance
for
|
Population of
heterogeneous nature.
|
Population of
homogeneous nature.
|
9
|
Verification
|
Cannot be verified
|
Results can be tested
taking out another small sample frame.
|
10
|
Nature of Method
|
Old and not a very
scientific method
|
New, Practicable and
Scientific method
|
Mehtods of Sampling
If the population is too large for the researcher
to attempt to survey all of its members then small, but carefully chosen sample
can be used to represent the population. Generally, there are two methods of selecting samples from the
population.
- Probability or Random Sampling Methods
- Non-Probability or Non- Random Sampling Methods
Difference Between Probability and Non-Probability Sampling
Basis
|
Probability Sampling Methods
|
Non-Probability Sampling Methods
|
Definition
|
based on the theory of probability
|
based on the subjective judgment of the researcher rather than
random selection
|
Population selection
|
Randomly (equal or non-zero or >0 probability)
|
Arbitrarily (no chance or unequal chance)
|
Sampling Error (differ from population)
|
Can be calculated
|
Remains unknown
|
Market Research
|
conclusive in nature
|
exploratory in nature
|
Time Taken
|
longer time
|
quick
|
Results
|
unbiased and conclusive
|
biased and speculative
|
Hypothesis
|
underlying before the study begins
|
derived after conducting the research study
|
(A.) Probability or Random Sampling Methods
Equal chance to all the individuals in the
population of being selected.
Types of Probability or Random Sampling
Process
- Numerical assignment to each individual in a population by labeling 00 to 50
- Total 15 individual need to pick (2 digits number) from random number table. For e. g. Tippett's: (1927), Fisher & Yates (1938), Kendall & Babington Smith (1939), Rand Corporation (1955), C.R. Rao, Mitra & Mathai (1966)
- or numbers are placed in a bowl and thoroughly mixed. Then, a blind-folded researcher selects n numbers.
- Computer random number generator, or a mechanical device can be used to select the sample.
Merits
- Most straightforward method of probability sampling.
- It is used when we don't have any kind of prior information about the target population.
- Each member o fthe population has an equal and known chance of being selected.
- Allow the sampling error to be calculated and reduces selection bias.
Demerits
- We may not select enough individuals with our characteristic of interest, especially if that characteristic is uncommon.
- If sampling units are scattered over a wide geographical area, then it may also be difficult to define a complete sampling frame and inconvenient to contact (email, phone, post).
2. Systematic Random Sampling
Systematic random sampling technique requires
selecting samples based on a system of regular intervals in a numbered
population.
3. Startified Random Sampling
In this sampling
technique, the subjects are initially grouped into a different group or strata
that share at least one common characteristic like age, gender, education
level, religion, income. Then, the researcher randomly selects the final list
of sufficient subjects from the different strata. It is also known as proportional or quota random sampling. But the strata are not to be overlapped.
Merits
- This method is appropriate when the population has mixed characteristics
- Useful to study a particular subgroup within the population or highly representative
- Superior to random sampling because it reduces sampling error
Demerits
- It requires knowledge of the appropriate characteristics of the sampling frame and it can be difficult to decide which characteristics to stratify by.
- It is not useful when there are no homogeneous subgroups.
- Can be expensive to implement
4. Cluster Random Sampling
Cluster random sampling is done due to the large size of a population or wide geographical region.
Difference between
Stratified and cluster sampling
Stratified sampling:
the sample includes elements from each stratum.
Cluster sampling: the
sample includes elements only from sampled clusters.
Process
- First identify boundaries or population
- Then, population is split into groups or strata on the basis of age, gender, or location
- Randomly selects a number of identified areas (groups or strata) as a cluster having similar characteristics
- Either include all the individuals or select subjects randomly within the selected areas
(B.) Non-Probability or Non-Random Sampling Methods
Samples are gathered in a process that does not
give all the individuals in the population equal chances of being selected or
does not involve random selection.
Types of Non-Probability or Non-Random Sampling
1. Convenience Sampling
Easiest
and economic method of sampling, because participants are selected based on
availability, reach, accessibility and willingness to take part. For e. g.
cinema hall to survey movie viewers, email survey, telephone directory, industrial directory, a
record of childbirth in a hospital etc., are the convenient use.This
is a quick way and easy of choosing participants (advantage),
but may not provide a representative sample, and could be biased (disadvantage).
2. Judgment Sampling
With
judgmental (purposive/deliberate) sampling the researcher believes that some
subjects are more fit or representative for the research compared to other
individuals. For e. g. Media canvassing the public for
opinions, coach select 15 players out of
30 players for national cricket team.Judgement sampling has the advantage of
being time-and cost-effective to perform. However, in addition to
volunteer bias, it is also prone to errors of judgement by the researcher and
the findings.
3. Quota Sampling
Researcher ensures equal or proportionate
representation of subjects from heterogeneous population depending on which
trait is considered as a basis of the quota not random sampling.
This
method of sampling is often used by market researchers. For example, TV viewing
survey
N = 500 n = 100
Adult men
= 30 (30%) Adult women = 30 (30%)
Teenage
girls = 20 (20%) Teenage boys = 20
(20%)
Whilst this has the advantage of being relatively straightforward and potentially representative, the chosen sample may not be representative of other characteristics that weren't considered.
4. Snowball Sampling
Snowball (referral) sampling involves
finding a small group of initial respondents and using them to recruit more
respondents through word-of-mouth. So, this referral technique goes on,
increasing the size of population like a snowball. It is particularly done by
using networks, when the desired sample characteristic is rare and unknown or
difficult (hidden) to trace respondents.While this technique can dramatically lower search costs, it comes
at the expense of introducing bias because the technique itself reduces the likelihood
that the sample will represent a good cross section from the population.
For e. g. : Surveys to gather information about HIV Aids, contract
tracing for COVID-19 etc.
Comments
Post a Comment
If you have any doubt, Please let me know !