6 Sampling and Non-sampling Errors
The statistical quality or reliability of a survey may obviously be influenced
by the errors that for various reasons affect the observations. Error components
are commonly divided into two major categories: Sampling and non-sampling
errors. In sampling literature the terms "variable errors" and
"bias" are also frequently used, though having a precise meaning
which is slightly different from the former concepts. The total error of
a survey statistic is labeled the mean square error, being the sum of variable
errors and all biases. In this section we will give a fairly general and
brief description of the most common error components related to household
sample surveys, and discuss their presence in and impacts on this particular
survey. Secondly, we will go into more detail as to those components which
can be assessed numerically.
Error Components and their Presence in the Survey
(1) Sampling errors are related to the sample design itself and the estimators
used, and may be seen as a consequence of surveying only a random sample
of, and not the complete, population. Within the family of probability sample
designs - that is designs enabling the establishment of inclusion probabilities
(random samples) - sampling errors can be estimated. The most common measure
for the sampling error is the variance of an estimate, or derivatives thereof.
The derivative mostly used is the standard error, which is simply the square
root of the variance.
The variance or the standard error does not tell us exactly how great the
error is in each particular case. It should rather be interpreted as a measure
of uncertainty, i.e. how much the estimate is likely to vary if repeatedly
selected samples (with the same design and of the same size) had been surveyed.
The variance is discussed in more detail in section 6.2.
(2) Non-sampling errors is a "basket" comprising all errors which
are not sampling errors. These type of errors may induce systematic bias
in the estimates, as opposed to random errors caused by sampling errors.
The category may be further divided into subgroups according to the various
origins of the error components:
- Imperfections in the sampling frame, i.e. when the population frame
from which the sample is selected does not comprise the complete population
under study, or include foreign elements. Exclusion of certain groups of
the population from the sampling frame is one example. As described in the
Gaza section, it was decided to exclude "outside localities" from
being surveyed for cost reasons. It was maintained that the exclusion would
have negligible effects on survey results.
- Errors imposed by implementary deviations from the theoretical sample
design and field work procedures. Examples: non-response, "wrong"
households selected or visited, "wrong" persons interviewed, etc.
Except for non-response, which will be further discussed subsequently, there
were some cases in the present survey in which the standard instructions
for "enumeration walks" had to be modified in order to make sampling
feasible. Any departure from the standard rules has been particularly considered
within the context of inclusion probabilities. None of the practical solutions
adopted imply substantial alterations of the theoretical probabilities described
in the previous sections.
- The field work procedures themselves may imply unforeseen systematic
biases in the sample selection. In the present survey one procedure has
been given particular consideration as a potential source of error: the
practical modification of choosing road crossing corners - instead of any
randomly selected spot - as starting points for the enumeration walks. This
choice might impose systematic biases as to the kind of households being
sampled. However, numerous inspection trials in the field proved it highly
unlikely that such bias would occur. According to the field work instructions,
the starting points themselves were never to be included in the sample.
Such inclusion would have implied a systematic over-representation of road
corner households, and thus may have caused biases for certain variables.
(Instead, road corner households may now be slightly under-represented in
so far as they as starting points are excluded from the sample. Possible
bias induced by this under-representation is, however, negligible compared
to the potential bias accompanying the former alternative.)
- Improper wording of questions, misquotations by the interviewer, misinterpretations
and other factors that may cause failure in obtaining the intended response.
"Fake response" (questions being answered by the interviewer himself/herself)
may also be included in this group of possible errors. Irregularities of
this kind are generally difficult to detect. The best ways of preventing
them is to have well trained data collectors, to apply various verification
measures, and to introduce the internal control mechanisms by letting data
collectors work in pairs - possibly supplemented by the presence of the
supervisor. A substantial part of the training of supervisors and data collectors
was devoted to such measures. Verification interviews were carried out by
the supervisors among a 10% randomly selected subsample. No fake interviews
were detected. However, a few additional re-interviews were carried out,
on suspicion of misunderstandings and incorrect responses.
- Data processing errors include errors arising incidentally during
the stages of response recording, data entry and programming. In this survey
the data entry programme used included consistency controls wherever possible,
aiming at correcting any logical contradictions in the data. Furthermore,
verification punch work was applied in order to correct mis-entries not
detected by the consistency control, implying that each and all questionnaires
have been punched twice.
Sampling Error - Variance of an Estimate
Generally, the prime objective of sample designing is to keep sampling error
at the lowest level possible (within a given budget). There is thus a unique
theoretical correspondence between the sampling strategy and the sampling
error, which can be expressed mathematically by the variance of the estimator
applied. Unfortunately, design complexity very soon implies variance expressions
to be mathematically uncomfortable and sometimes practically "impossible"
to handle. Therefore, approximations are frequently applied in order to
achieve interpretable expressions of the theoretical variance itself, and
even more to estimate it.
In real life practical shortcomings frequently challenge mathematical comfort.
Absence of sampling frames or other prior information forces one to use
mathematically complex strategies in order to find feasible solutions. The
design of the present survey - stratified, 4-5 stage sampling with varying
inclusion probabilities - is probably among the extremes in this respect,
implying that the variance of the estimator (5.2) will be of the utmost
complexity - as will be seen subsequently.
The (approximate) variance of the estimator (5.2) is in its simplest form:
The variances and covariances on the right hand side of (6.1) may be expressed
in terms of the stratum variances and covariances:
Proceeding one step further the stratum variance may be expressed as follows9:
where we have introduced the notation ps (k) = P1 (s, k). The ps (k, l)
is the joint probability of inclusion for PSU (s,k) and PSU (s,l), and
the variance of the PSU (s,k) unbiased estimate . The variance of is obtained
similarly by substituting x with N in the above formula. The stratum covariance
formula is somewhat more complicated and is not expressed here.
The PSU (s,k) variance components in the latter formula have a structure
similar to the stratum one, as is realized by regarding the PSUs as separate
"strata" and the cells as "PSUs". Again, another variance
component emerges for each of the cells, the structure of which is similar
to the preceding one. In order to arrive at the "ultimate" variance
expression yet another two or three similar stages have to be passed. It
should be realized that the final variance formula is extremely complicated,
even if simplifying modifications and approximations may reduce the complexities
stemming from the 2nd - 5th sampling stages.
It should also be understood that attempts to estimate this variance properly
and exhaustively (unbiased or close to unbiased) would be beyond any realistic
effort. Furthermore, for such estimation to be accomplished certain preconditions
have to be met. Some of these conditions cannot, however, be satisfied (for
instance: at least two PSUs have to be selected from each stratum comprising
more than one PSU). We thus have to apply a more simple method for appraising
the uncertainty of our estimates.
Any sampling strategy (sample selection approach and estimator) may be characterized
by its performance relative to a simple random sampling (SRS) design, applying
the sample average as the estimator for proportions. The design factor of
a strategy is thus defined as the fraction between the variances of the
two estimators. If the design factor is, for instance, less than 1, the
strategy under consideration would be better than SRS. Usually, multi-stage
strategies are inferior to SRS, implying the design factor being greater
than 1.
The design factor is usually determined empirically. Although there is no
overwhelming evidence in its favour, a factor of 1.5 is frequently used
for stratified, multi-stage designs. (The design factor may vary among survey
variables). The rough approximate variance estimator is thus:
where p is the estimate produced by (5.2) and nT is the number of observations
underlying the estimate (the "100%"). Although this formula oversimplifies
the variance, it still takes care of some of the basic features of the real
variance; the variance decreases by increasing sample size (n), and tends
to be larger for proportions around 50% than at the tails (0% or 100%).
The square root of the variance, i.e. or briefly s, is called the standard
error, and is tabulated in table A.12 for various values of p and n.
Table A.12 Standard error estimates for proportions (s and p are specified as percentages).
Number of obs. | Estimated proportion (p %) |
(n) | 5/95 | 10/90 | 20/80 | 30/70 | 40/60 | 50 |
10 | 8.4 | 11.6 | 15.5 | 17.7 | 19.0 | 19.4 |
20 | 6.0 | 8.2 | 11.0 | 12.5 | 13.4 | 13.7 |
50 | 3.8 | 5.2 | 6.9 | 7.9 | 8.5 | 8.7 |
75 | 3.1 | 4.2 | 5.7 | 6.5 | 6.9 | 7.1 |
100 | 2.7 | 3.7 | 4.9 | 5.6 | 6.0 | 6.1 |
150 | 2.2 | 3.0 | 4.0 | 4.6 | 4.9 | 5.0 |
200 | 1.9 | 2.6 | 3.5 | 4.0 | 4.2 | 4.3 |
250 | 1.7 | 2.3 | 3.1 | 3.5 | 3.8 | 3.9 |
300 | 1.5 | 2.1 | 2.8 | 3.2 | 3.5 | 3.5 |
350 | 1.4 | 2.0 | 2.6 | 3.0 | 3.2 | 3.3 |
400 | 1.3 | 1.8 | 2.5 | 2.8 | 3.0 | 3.1 |
500 | 1.2 | 1.6 | 2.2 | 2.5 | 2.7 | 2.7 |
700 | 1.0 | 1.4 | 1.9 | 2.1 | 2.3 | 2.3 |
1000 | 0.8 | 1.2 | 1.5 | 1.8 | 1.9 | 1.9 |
1500 | 0.7 | 0.9 | 1.3 | 1.4 | 1.5 | 1.6 |
2000 | 0.6 | 0.8 | 1.1 | 1.3 | 1.3 | 1.4 |
2500 | 0.5 | 0.7 | 1.0 | 1.2 | 1.2 | 1.2 |
Confidence Intervals
The sample which has been surveyed is one specific outcome of an "infinite"
number of random selections which might have been done within the sample
design. Other sample selections would most certainly have yielded survey
results slightly different from the present ones. The survey estimates should
thus not be interpreted as accurately as the figures themselves indicate.
A confidence interval is a formal measure for assessing the variability
of survey estimates from such hypothetically repeated sample selections.
The confidence interval is usually derived from the survey estimate itself
and its standard error:
Confidence interval: [p - c s, p + c s] where the c is a constant which
must be determined by the choice of a confidence coefficient, fixing the
probability of the interval including the true, but unknown, population
proportion for which p is an estimate. For instance, c=1 corresponds to
a confidence probability of 67%, i.e. one will expect that 67 out of 100
intervals will include the true proportion if repeated surveys are carried
out. In most situations, however, a chance of one out of three to arrive
at a wrong conclusion is not considered satisfactory. Usually, confidence
coefficients of 90% or 95% are preferred, 95% corresponding to approximately
c=2. Although our assessment as to the location of the true population proportion
thus becomes less uncertain, the assessment itself, however, becomes less
precise as the length of the interval increases.
Comparisons between groups
Comparing the occurrence of an attribute between different sub-groups of
the population is probably the most frequently used method for making inference
from survey data. For illustration of the problems involved in such comparisons,
let us consider two separate sub-groups for which the estimated proportions
sharing the attribute are , respectively, while the unknown true proportions
are denoted p1 and p2. The corresponding standard error estimates are s1
and s2. The problem of inference is thus to evaluate the significance of
the difference between the two sub-group estimates: Can the observed difference
be caused by sampling error alone, or is it so great that there must be
more substantive reasons for it?
We will assume that the estimate is the larger of the two proportions observed.
Our problem of judgement will thus be equivalent to testing the following
hypothesis:
Hypothesis: p1 = p2
Alternative: p1 > p2
In case the test rejects the hypothesis we will accept the alternative as
a "significant" statement, and thus conclude that the observed
difference between the two estimates is too great to be caused by randomness
alone. However, as is the true nature of statistical inference, one can
(almost) never draw absolutely certain conclusions. The uncertainty of the
test is indicated by the choice of a "significance level", which
is the probability of making a wrong decision by rejecting a true hypothesis.
This probability should obviously be as small as possible. Usually it is
set at 2.5% or 5% - depending on the risk or loss involved in drawing wrong
conclusions.
The test implies that the hypothesis is rejected if
where the constant c depends on the choice of significance level:
Significance level c-value
------------------ -------
2.5% 2.0
5.0% 1.6
10.0% 1.3
As is seen, the test criteria comprise the two standard error estimates
and thus imply some calculation. It is also seen that smaller significance
levels imply the requirement of larger observed differences between sub-groups
in order to arrive at significant conclusions. One should be aware that
the non-rejection of a hypothesis leaves one with no conclusions at all,
rather than the acceptance of the hypothesis itself.
Non-response
Non-response occurs when one fails to obtain an interview with a properly
pre-selected individual (unit non-response). The most frequent reasons for
this kind of non-response are refusals and absence ("not-at-homes").
Item non-response occurs when a single question is left unanswered.
Non-response is generally the most important single source of bias in surveys.
Most exposed to non-response bias are variables related to the very phenomenon
of being a (frequent) "not-at-homer" or not (example: cinema attendance).
In Western societies non-response rates of 15-30% are normal.
Various measures have been undertaken to keep non-response at the lowest
level possible. Most of all confidence-building has been of concern, implying
contacts with local community representatives have been made in order to
enlist their support and approval. Furthermore, many hours have been spent
explaining the scope of the survey to respondents and anyone else wanting
to know, assuring that the survey makers neither would impose taxes on people
nor demolish their homes, or - equally important for the reliability of
the survey - bring direct material aid.
Furthermore, up to 4 call-backs were applied if selected respondents were
not at home. Usually the data collectors were able to get an appointment
for a subsequent visit at the first attempt, so that only one revisit was
required in most cases. Unit non-response thus comprises refusals and those
not being at home at all after four attempts.
Table A.13 shows the net number of respondents and non-responses in each
of the three parts of the survey. The initial sizes of the various samples
are deduced from the table by adding responses and non-responses. For the
household and RSI samples, the total size was 2,518 units, while the female
sample size was 1,247. It is seen from the bottom line that the non-response
rates are outstandingly small compared to the "normal" magnitudes
of 10 - 20% in similar surveys. Consequently, there should be fairly good
evidence for maintaining that the effects of non-response in this survey
are insignificant.
Table A.13 Number of (net) respondents and non-respondents in the tree parts of the survey
| Households | RSIs | Women |
Region | Resp. | Non-resp. | Resp. | Non-resp. | Resp. | Non-resp. |
Gaza | 970 | 8 | 968 | 10 | 482 | 4 |
West Bank | 1,023 | 16 | 1,004 | 35 | 502 | 14 |
Arab Jerusalem | 486 | 15 | 478 | 23 | 240 | 5 |
Total | 2,479 | 39 | 2,450 | 68 | 1,224 | 23 |
Non-response rate | 1.5% | 2.7% | 1.8% |
|