Unit 6.2 - Collection of Data

Introduction to Data Collection

Data collection is the component of research in all fields of study or process of gathering and measuring information on targeted quantitative and qualitative data or variables in an established system or instruments (existing, modified or newly developed).
While methods vary by discipline, the emphasis on ensuring accurate and honest collection remains the same. Formal and accurate data collection process is essential to maintain the integrity of research (Quality assurance-action before data collection & Quality control-action during and after data collection) by reducing the likelihood of errors.

Data Collection Steps

  1. Determine what information you want to collect
  2. Set a time-frame for data collection
  3. Determine your data collection method
  4. Collect the data
  5. Analyze the data and implement your findings

Data Collection Problem (necessitates prompt action)

  1. Systematic errors
  2. Violation of protocol
  3. Fraud or scientific misconduct
  4. Errors in individual data items
  5. Individual staff or site performance problems

Pre-requisites or Preliminaries for Data Collection

  1. Objectives and Scope of the enquiry: Objective highlights the nature of statistics to be collected & statistical techniques to be employed. Scope relates to the coverage with respect to the type of information, subject matter and geographical location.
  2. Statistical units to be used: 
    • Physical Units – kg. or pound, km. or miles, ropani or bigha etc.
    • Arbitrary units – person, family, location etc.
  3. Sources of information: Primary or secondary or both
  4. Method of data collection – If primary data is used then we can collect data using census or sample method.
  5. Degree of accuracy: 95 % confidence level or 5% error
  6. Type of enquiry: (a) official, semi-official or un-official (b) Initial or initiative (c) Direct or indirect (d) Regular or ad-hoc (e) Census or sample (f) Primary or secondary Pre-requisites or Preliminaries for Data Collection

Data Collection Methods

There are many ways of classifying data. A common classification is based upon who collected the data or approaches to information gathering or sources, data can be categorized as:
·        Primary data
·        Secondary data.

Primary Data

Primary data is new information obtained directly (control and supervision) from the first hand source, using methods like surveys, interviews or experiments. Primary data is also typically first party data or raw data and original in nature.

Advantages of Using Primary Data

  1. High level of control over data collection for design, method, and data analysis techniques to be used.
  2. Collection of data specific to the problem (resolve specific research issues)
  3. Better accuracy or quality
  4. Up to date real-time data
  5. Exhibit ownership of the data
  6. Additional data obtaining possibility during the study period.

Disadvantages of Using Primary Data

  1. Expensive
  2. More time consuming
  3. Not feasible to collect due to complexity and commitment

Secondary Data

Secondary data is public/existing or second-hand information collected and recorded by someone else for some other purpose (but being utilized by the investigator for another purpose). It is typically free or inexpensive to obtain and easily accessible.
It is the readily available form of data collected from various sources like censuses, government publications, internal records of the organization, reports, books, journal articles, websites and so on.

Advantages of Using Secondary Data

  1. The data’s already there- no hassles of data collection
  2. Less expensive
  3. Less time consuming
  4. The investigator is not personally responsible for the quality of data

Disadvantages of Using Secondary Data

  1. The investigator cannot decide what is collected (if specific data about something is required, for instance).
  2. One can only hope that the data is of good quality
  3. Obtaining additional data (or even clarification) about something is not possible (most often)
  4. May outdated data

Comparison Between Primary and Secondary Data

Similarities Between Primary and Secondary Data

  1. Both are statistical data.
  2. Helpful in the statistical investigation
  3. Both can be qualitative and qualitative

Differences Between Primary and Secondary Data



S .No.
Basis
Primary Data
Secondary Data
1
Meaning
First hand data generated by researcher himself
Second hand data collected by someone else earlier
2
Originality
Original
Not original
3
Data Types
Real time (Time sensitive)
Stale (no longer new or fresh)
4
Capability of Problem Solving
More (Problem specific)
Less (Not problem specific)
5
Process
Very Involved
Rapid and easy
6
Sources (Tools)
Surveys, observation, experiments, questionnaire, interview
Govt. publications, websites, books, journals, internal records
7
Time, Cost & Manpower
More (Expensive)
Less (Economical)
8
Control
Yes (Direct supervision)
Lesser
9
Nature of Data Availability
Crude form (Raw data)
Refined (finished) form of primary data
10
Accuracy & Reliability
More
Relatively less
10

Precaution & Editing
Not required
Required
11
Proprietary Information
Ownership & data remain hidden from the competitors
No ownership & competitors have access to the data
12
Personal Prejudice
Possibility
Less possibility
13
Relevancy
Relevant to the user's need
May not be relevant to the user's need
14
Advantage
Authentic, specific, up to date
Very cheap and not time-consuming
15
Disadvantage
Costly & Time Consuming
May outdated or irrelevant

Methods of Collecting Primary Data

The choice of method is influenced by the data collection strategy, the type of variable, the accuracy required, the collection point and the skill of the enumerator. The main traditional data collection methods are:

1. Direct Personal Interviews

Conversation with a purpose
Direct Personal Interview
Forms which are completed through an interview (oral questionnaire) with the respondent in a face-to-face, via telephone or in online relationship.
In interviews, information is obtained through inquiry (oral or verbal response) and recorded by enumerators. Structured interviews are performed by using survey forms, whereas open interviews are notes taken while talking with respondents. More expensive than questionnaires, but interviews are better for more complex questions, low literacy or less co-operation.

Merits

  1. Interview is relatively more flexible tool than any written inquiry form and permits explanation, adjustment and variation according to the situation.
  2. A researcher or an interviewer can interact with his respondents and know their inner feelings and reactions.
  3. In-depth information
  4. Accurate information

Demerits

  1. Expensive method
  2. Possibility of bias interviewer or respondent
  3. More time consuming
  4. The interviewers must be well-trained in the necessary soft skills and the relevant subject matter.

2. Indirect Oral Investigations

When direct personal interview is not suitable and in case of sensitive information relates to wealth, corruption, prostitution, illegal activities, this method is applied. Mediator or witness provide the information or data. Police department, investigation department mostly use this method.

Merits

  1. Less time consuming, less expensive
  2. Wide area coverage
Demerits
  1. Cross examination is necessary if witness biased
  2. Personal biasness

3. Questionnaire

Questionnaire
It consists of a number of questions printed or typed in a definite order on a form. Forms can be handed out or sent by mail (post office, email) or fax and later completed and returned by respondents. An inexpensive method that is useful where literacy rates are high and respondents are co-operative.

Essential of a Good Questionnaire

  • It should be short and simple
  • Questions should proceed in a logical sequence
  • Technical terms and vague expressions must be avoided
  • Control questions to check the reliability of the respondent must be present
  • Adequate space for answers must be provided
  • Brief directions with regard to filling up of questionnaire must be provided
  • The physical appearances – quality of paper, color must be good to attract the attention of the respondent

Merits

  1. Economical (money, time & manpower)
  2. Cover wide area: large amounts of information can be collected  from  a large number of people in a short period of time
  3. Free from bias of interviewer
  4. Respondents have adequate time to give answers

Demerits

  1. Problem relating  to  question  construction
  2. Low response rate (low rate of return of duly filled questionnaire)
  3. Possibility of ambiguous or omission of replies
  4. Difficulty in obtaining mailing address
  5. Inflexible ( control over question is lost once it is sent)
  6. It can be used only when the respondents are educated and cooperative.

4. Schedule

Schedule
Schedule is a set of questions (or a formal list) which are asked and filled by an interviewer (enumerators) in a face to face situation with other person. The success of schedule largely depends on the efficiency and tactfulness of the interviewer rather than the quality of questions posed. Enumerators explain the aims and objects of the investigation and also remove the difficulties which any respondent may feel in understanding the implications of a particular question. This type of data is helpful in extensive enquiries however it is very expensive and is usually adopted in investigations conducted by governmental agencies or by some organizations. Population census all over the world is conducted through this method.

Merits

  1. Wide area coverage: large amounts of information can be collected  from  a large number of people
  2. Supplementary information

       Demerits

  1. Very expensive
  2. More time consuming
  3. Bias of the enumerators cannot be ruled out

5. Information Received Through Local Agencies (Correspondents)

The investigator appoints local agents or correspondents in different places to collect information under this method. These correspondents (generally paid staff) collect and transmit the information to the central office where data are processed. This method is generally adopted by newspaper agencies, government departments to obtain information at regular intervals from a wide area.

Merits

  1. Cheap
  2. Appropriate for extensive investigation

Demerits

  1. Not always ensure accurate results because of the personal prejudice and bias of the correspondents.
  2. Required skilled and experienced correspondents 

Sources of Secondary Data

Secondary data is  often  readily available and easily accessible. Nowadays, after the  expense  on  electronic media and  internet  the availability of secondary data has become much easier. Major sources of secondary data are as follows:
  1. Published Sources
  2. Unpublished

Sources of Secondary Data

1. Published Sources

There  are  varieties  of  published  printed and electronic sources like books, journals/periodicals, magazines and newspapers, e-journals from e-library, websites, weblogs or blogs (personal written diaries)  etc.  Their  credibility depends on  many factors.  For example, on  the writer,  publishing company and  time  and date  when published.  New  sources  are  preferred  and  old  sources  should  be  avoided  as  new  technology  and researches bring new facts into light. A few major published secondary data from various sources or agencies are mentioned below:
  1. Government Statistics: Government (3 tier) statistics are widely available and easily accessed, and can provide insights related to trade activity, business formation, patents, pricing and economic trends, among other topics. In Nepal, MOF, CBS, NRB, NPC and other ministerial publications are government statistics

  2. Semi-government Statistics: TU, NBL, NIDC, NTC, NEA

  3. International Publications: Data published by international organizations is collected by researching on a wider population. There are various international organizations such as WB, IMF, WHO, WTO, UN, ILO, ADB etc.

  4. Industry Associations & Company Websites: Some information may be accessible to members only (such as member directories or market research), but these are a great place to look when starting to learn about a new industry or when looking for information of annual reports, regulatory findings. Reports & publications of trade union, chamber of commerce, BFIs, stock exchange are also the published sources of secondary data.

  5. Research Institutions: Economists, research scholars, universities and other educational & research institutions 

  6. Newspapers and magazines: There are many newspapers and magazines which are a useful, important, cheap, accessible and reliable source of secondary data. Popular examples of international magazines are The Economist, Money, Frontline, Bloomberg Business Week, Entrepreneur, The New Yorkers, Forbes, The Wall Street Journal, and Business World, etc.

2. Unpublished Sources

Some records maintained by private firms, business enterprises, scholars,, research workers, may not release to outside agency or may not readily available and easily accessible. Some of the major unpublished sources from which secondary data can be gathered are:
  1. Diaries: Rarely available personal records
  2. Letters: Reliability should be checked before using them.
  3. Government  Records: important  for  marketing,  management, humanities and social science research.
  4. Public Sector Records: NGOs’ survey data, health records, police records
  5. Database or records of private company.

Desirable Qualities (Reliability) of Secondary Data

Because of the disadvantages (error, inconsistency) of secondary data, we will lead to evaluation of quality of secondary data. Evaluation means the following four requirements must be satisfied:
  1. Availability- Secondary data should be easily available. If secondary data is not available as required, it is necessary to go for primary data.
  2. Reliability - The reliability of published statistics may vary over time. Reliability can be verified through some investigations like (a) Who collects data? (b) From what source? (c) Which methods (sample size, response rate, questionnaire design, modes of analysis)? (d) Possibility of bias of stratifying a sample (geographical, administrative boundaries)? (ed) Accuracy or error?
  3. SuitabilitySuitability of data indicates the relevancy or applicability of data to the current problem. The data must be suitable for the (a) Research objective (b) Units of measurement (c) Concepts and currency (d) Current time (i.e. census data may out-of-date, data may be collected in abnormal year)
  4. Adequacy – Sufficient data must be available for the study. The data is considered inadequate (a) if the level of accuracy achieved in data is found inadequate (b) if they are related to an area (or scope) may be either narrower or wider than the area of the present enquiry. 

Census and Sample

Census and sampling are methods of collecting survey data about the population that are used by many countries. 

Census

The entire set of possible observations (complete statistical enumeration) in which the data are collected for each and every element/unit of the population, it is termed as Census Method. Hence this method requires huge finance, time and labour for gathering information.In our country, the Government conducts the Census of Nepal every 10 years. The Census reveals demographic information such as birth rates, death rates, total population, population growth rate of our country, etc.

Merits and Demerits of Census Method

Sample

A subset/fraction of the population (statistical enumeration of subgroup) from which information is actually collected and represents the entire group. This method is used, when population size is very large. The units which constitute sample is considered as ‘Sampling Units’. The full-fledged list containing all sampling units is called ‘Sampling Frame’. For e. g. Blood Test, Cook Testing by Chef.

Merits and Demerits of Sample Method

Difference between Census and Sampling Methods

S. No.
 BASIS FOR COMPARISON
CENSUS
SAMPLING
1
Meaning
A systematic method that collects and records the data about the members of the population is called Census.
Sampling refers to a portion of the population selected to represent the entire group, in all its characteristics.
2
Enumeration
Complete (extensive)
Partial (limited)
3
Study of
Each and every unit of the population.
Only a handful of units of the population.
4
Time required
It is a time-consuming process.
It is a fast process.
5
Cost
Expensive method
Economical method
6
Results
Reliable and accurate
Less reliable and accurate, due to the margin of error in the data collected.
7
Error
Not present.
Depends on the size of the population
8
Appropriate/relevance for
Population of heterogeneous nature.
Population of homogeneous nature.
9
Verification
Cannot be verified
Results can be tested taking out another small sample frame.
10
Nature of Method
Old and not a very scientific method
New, Practicable and Scientific method

Mehtods of Sampling

If the population is too large for the researcher to attempt to survey all of its members then small, but carefully chosen sample can be used to represent the population. Generally, there are two methods of selecting samples from the population. 
  1. Probability or Random Sampling Methods 
  2. Non-Probability or Non- Random Sampling Methods 

Difference Between Probability and Non-Probability Sampling


Basis
Probability Sampling Methods
Non-Probability Sampling Methods
Definition
based on the theory of probability
based on the subjective judgment of the researcher rather than random selection
Population selection
Randomly (equal or non-zero or >0 probability)
Arbitrarily (no chance or unequal chance)
Sampling Error (differ from population)
Can be calculated
Remains unknown
Market Research
conclusive in nature
exploratory in nature
Time Taken
longer time
quick
Results
unbiased and conclusive
biased and speculative
Hypothesis
underlying before the study begins
derived after conducting the research study

(A.) Probability or Random Sampling Methods

Equal chance to all the individuals in the population of being selected.

Types of Probability or Random Sampling

1. Simple Random Sampling

Easiest and purest form of probability sampling from which desired sample are randomly selected from whole population.

Process

  1. Numerical assignment to each individual in a population by labeling 00 to 50
  2. Total 15 individual need to pick (2 digits number) from random number table. For e. g. Tippett's: (1927), Fisher & Yates (1938), Kendall & Babington Smith (1939), Rand Corporation (1955), C.R. Rao, Mitra & Mathai (1966)
  3. or numbers are placed in a bowl and thoroughly mixed. Then, a blind-folded researcher selects n numbers.
  4. Computer random number generator, or a mechanical device can be used to select the sample. 

Merits

  1. Most straightforward method of probability sampling. 
  2. It is used when we don't have any kind of prior information about the target population.
  3. Each member o fthe population has an equal and known chance of being selected. 
  4. Allow the sampling error to be calculated and reduces selection bias. 

Demerits 

  1. We may not select enough individuals with our characteristic of interest, especially if that characteristic is uncommon. 
  2. If sampling units are scattered over a wide geographical area, then it may also be difficult to define a complete sampling frame and inconvenient to contact (email, phone, post). 

2. Systematic Random Sampling

Systematic random sampling technique requires selecting samples based on a system of regular intervals in a numbered population.

Merits

  1. Simple technique
  2. Population will be evenly sampled

Demerits

  1. It may also lead to order bias, because sample of the sampling frame coincides with the periodicity of the underlying pattern. 
  2. Vulnerable to periodicity in the list and likely to be unrepresentative and less accurate. 

3. Startified Random Sampling

In this sampling technique, the subjects are initially grouped into a different group or strata that share at least one common characteristic like age, gender, education level, religion, income. Then, the researcher randomly selects the final list of sufficient subjects from the different strata. It is also known as proportional or quota random sampling. But the strata are not to be overlapped.

Merits

  1. This method is appropriate when the population has mixed characteristics
  2. Useful to study a particular subgroup within the population or highly representative
  3. Superior to random sampling because it reduces sampling error

Demerits

  1. It requires knowledge of the appropriate characteristics of the sampling frame and it can be difficult to decide which characteristics to stratify by. 
  2. It is not useful when there are no homogeneous subgroups. 
  3. Can be expensive to implement 

4. Cluster Random Sampling

Cluster random sampling is done due to the large size of a population or wide geographical region.

Difference between Stratified and cluster sampling

Stratified sampling: the sample includes elements from each stratum.

Cluster sampling: the sample includes elements only from sampled clusters. 

Process

  1. First identify boundaries or population
  2. Then, population is split into groups or strata on the basis of age, gender, or location
  3. Randomly selects a number of identified areas (groups or strata) as a cluster having similar characteristics
  4. Either include all the individuals or select subjects randomly within the selected areas

Merits

  1. It is easier to contact lots of individuals
  2. Good for dealing with large and dispersed populations
  3. Economical in reducing cost by concentrating on the selected clusters

Demerits

  1. An increased ris of bias or sampling error, if the chosen clusters are not representative of the population

(B.) Non-Probability or Non-Random Sampling Methods

Samples are gathered in a process that does not give all the individuals in the population equal chances of being selected or does not involve random selection. 

Types of Non-Probability or Non-Random Sampling

1. Convenience Sampling

Easiest and economic method of sampling, because participants are selected based on availability, reach, accessibility and willingness to take part. For e. g. cinema hall to survey movie viewers, email survey, telephone directory, industrial directory, a record of childbirth in a hospital etc., are the convenient use.This is a quick way and easy of choosing participants (advantage), but may not provide a representative sample, and could be biased (disadvantage). 

2. Judgment Sampling

With judgmental (purposive/deliberate) sampling the researcher believes that some subjects are more fit or representative for the research compared to other individuals. For e. g. Media canvassing the public for opinions,  coach select 15 players out of 30 players for national cricket team.Judgement sampling has the advantage of being time-and cost-effective to perform. However, in addition to volunteer bias, it is also prone to errors of judgement by the researcher and the findings. 

3. Quota Sampling

Researcher ensures equal or proportionate representation of subjects from heterogeneous population depending on which trait is considered as a basis of the quota not random sampling.
This method of sampling is often used by market researchers. For example, TV viewing survey
N = 500                                    n = 100           
Adult men = 30 (30%)            Adult women =  30 (30%)     
Teenage girls = 20 (20%)       Teenage boys = 20 (20%)
Whilst this has the advantage of being relatively straightforward and potentially representative, the chosen sample may not be representative of other characteristics that weren't considered

4. Snowball Sampling

Snowball (referral) sampling involves finding a small group of initial respondents and using them to recruit more respondents through word-of-mouth. So, this referral technique goes on, increasing the size of population like a snowball. It is particularly done by using networks, when the desired sample characteristic is rare and unknown or difficult (hidden) to trace respondents.While this technique can dramatically lower search costs, it comes at the expense of introducing bias because the technique itself reduces the likelihood that the sample will represent a good cross section from the population.
For e. g. : Surveys to gather information about HIV Aids, contract tracing for COVID-19 etc. 

Comments

Popular posts from this blog

Unit 2.1 - Market and Revenue Curves