FAFO Report 151

2 Gaza Strip Sample Design

The design adopted has been a stratified, 4-5 stages procedure involving simple random sampling at each stage. Information about the survey population was initially quite scarce. It seemed that the reliable, basic statistics necessary for proper sampling would have to be produced during the process of sampling itself. Fortunately, by the conclusion of the planning process some of the information most needed for sample allocation, emerged in the form of unpublished material provided by the Gaza statistical office. Thus population figures which otherwise had to be estimated by enumeration of some 100 neighbourhoods, became accessible.

Relevant sampling frames (directory, register of the household population) for selection of households were not available. The final stages of the sampling procedure therefore were carried out through "on-the-spot sampling", involving map preparations prior to the data collection stage.

Sampling Details
The complete sample design comprises the following steps:
1 Definition/construction of Primary Sampling Units (PSUs). The PSUs are areas or localities which are easily identified on maps. In most cases a PSU coincides with the administrative concept of a "locality", for which more detailed maps were available.

2 The PSUs were stratified by type of locality (see table A.4). Strata are labeled s (= 1,...,8), and PSUs are labeled k (= 1,2,...). The total number of PSUs in stratum s is denoted K(s). We will use the notation "PSU (s,k)" to denote the k'th PSU of the s'th stratum.
3 A 1st stage sample of PSUs to be surveyed were selected by simple random sampling within each stratum. The sample number of PSUs in stratum s is denoted k(s). The 1st stage sampling fraction for the s'th stratum is

which in this case (simple random sampling) is the inclusion probability of PSU (s,k) as well. (2.1) implies that all PSUs of the same stratum have an equal chance of being included in the sample.
It may be seen from table A.4 that stratum 8, "Outside localities", was not included in the sample. The character of these areas is somewhat different from that of other localities, as there is no municipal authority. On strict scientific grounds it could be argued that this particular feature might require separate investigation in the present survey. However, the estimated size of the population in these areas amounts to just 1% of the Gaza Strip total, i.e. approximately 10 sample observations. Furthermore, inclusion of the areas in the sample would require special measures as to the sample design, and field work costs would be significantly higher than elsewhere. Thus, exclusion of these few observations from the sample would have negligible impact on aggregate survey results, while making practical sense.

Table A.4 Stratification of primary sampling units (PSUs), 1st stage sample. Gaza Strip

Stratum Number of PSUs

No. Type of locality Population Sample

s K (s) k (s)

1 Gaza City EAST 3 2

2 Gaza City WEST 3 1

3 Towns 4 2

4 Northern Camps 2 1

5 Middle Camps 4 1

6 Southern Camps 2 1

7 Villages 9 3

8 Outside localities 1 0

TOTAL 28 11

4 Each of the (sample) PSUs were subdivided into cells by using maps provided by the local statistical office. For the 2nd stage selection of cells within each of the sample PSUs, simple random sampling was applied.

Denote by B(s,k) the total number of population cells within PSU (s,k), and by b(s,k) the number of cells included in the sample. Thus, the 2nd stage sampling fraction (conditional inclusion probability) for the cells (labeled c) of PSU (s,k) is:

The B(s,k)s were counted by inspection of the maps of the sample PSUs. The inclusion probability is independent of c, i.e. all cells of the same PSU have an equal probability of being selected. The numbers of population and sample cells for each of the PSUs selected at the 1st stage, are shown in table A.5.

5 Due to the absence of satisfactory sampling frames for the cells, households to be visited for the purpose of interview(s) had to be selected in the field. We denote the total number of households of the cell D(s,k,c), and the number to be included in the sample d(s,k,c). Rough estimates of the (sample) D(s,k,c)s were provided by the Gaza statistical office.

However, prior to the selection of households an additional sampling stage had to be imputed by the sampling of housing units. A "housing unit" is a set of one or more households sharing a common main entrance (front door) of a building or compound. In cases where there were several main entrances, presumably leading to different groups of households, each would be regarded as a separate housing unit.

During the process of sample preparations a procedure for direct selection of sample households was considered. However, inspection of some cells showed that such a procedure would generally be very difficult to implement, due to complex housing structures, inadequate detailing and updating of maps, non-display of road names and house numbers, absence of doorbells or other hints to help identify pre-selected households. As opposed to the problems of enumerating households, housing units (front doors) proved to be more easily identifiable even in complex areas.

For each sample cell an enumeration system was developed for identifying and selecting housing units. Briefly stated, it included random selection of spots defining the starting points from which uniquely specified "enumeration walks" (instructions for the directions of the walk and how to count housing units) were to be initiated. During these walks every 3rd housing unit was to be selected until a full subsample was obtained. For each "enumeration walk" 4-6 housing units were normally selected. To help the data collectors identify sample housing units, the field work supervisors were thoroughly instructed - theoretically as well as through training in the field - in the selection of housing units and preparation of map sketches of every "enumeration walk".

In order to ease the practical identification, the starting points normally were designated at corners of road crossings within the cells.

Denote by H(s,k,c) the total number of housing units within cell (s,k,c) - a number which was not available prior to the field work. As mentioned previously, estimates of the total number of households within the sample cells, D(s,k,c), were available, providing an opportunity to estimate H(s,k,c) upon completion of the field work.

The number of households included in housing unit (s,k,c,h) is denoted D(s,k,c,h). The average number of households per housing unit of cell (s,k,c) can be estimated from the sample data:

An estimate for the total number of cell housing units is thus:

Multi-household housing units frequently, though not always, comprise households which are closely linked through family ties, and are thus likely to be more homogeneous than are households from different housing units. Therefore, selecting more than one household from the same housing unit can be seen as a waste of resources as observations may be highly correlated. The selection of one household per sample housing unit implies more housing units - mutually less homogeneous - to be included in the sample, which may cause smaller sampling error.

When selecting just one household from each unit, the number of housing units to be selected of course equals the number of households - i.e. d(s,k,c). Thus, the sampling fractions for the two stages involved are:
Housing units (3rd stage):

Households (4th stage):

A special form carrying random numbers was prepared for the selection of one household from a housing unit. The form used comprised separate columns for every relevant total number of households within a housing unit (column headings). Each column thus contained a sequence of random numbers less than or equal to the total in the heading. Each random number was to be used only once, and the questionnaire number entered into the form adjacent to the number used in each particular case for control reasons.

Before selecting a household, all households of the housing unit were enumerated. Standard rules for enumeration were:

Enumeration should start from the top floor or top level and proceed downwards.
Households on the same floor/level should be enumerated in clockwise order, starting at the spot of entrance.

6 The respondent to the main questionnaire was to be the Head of Household. In case the Head of Household was not available for interview, he/she might be substituted by another household member likely to provide the same questionnaire information as the Head of Household.

7 Sample of Individuals and Females
The gender of the RSI was decided prior to the field work. By doing this, one could allocate more efficiently female enumerators to interview women, which was considered paramount in order to ensure trust and confidentiality. For the same reason, enumerators worked in pairs of the same sex.
The sample of individuals ("Randomly Selected Individual's (RSI) questionnaire") as well as the sample of women ("Women's questionnaire") to be interviewed were both derived from the sample of households.

The following procedure was adopted, based on the premise that the proportion of women among the Gaza population of age 15 years or older is close to 50%.

After the household (main) sample had been selected, a subsample of size 50% was drawn from the main sample. The subsample was selected separately for each of the cells by simple random sampling. (If the number of cell interviews was uneven, the "majority sex" of each cell was altered successively so that the accumulated sex proportions for all cells approximated the correct ones). Thus there were two subsamples - one female and one male. The data collectors had particular instructions for deciding who was to interview the various types of respondents.
The members (15+) of each sex were to be enumerated for each sample household, and the numbers, denoted W(s,k,c,h,d) for females and M(s,k,c,h,d) for males.
The members (15+) of the pre-decided sex were then listed by descending age.
One of the individuals thus listed was selected by simple random sampling, applying a random numbers form especially prepared for the random selection of individuals (similar to the household selection form). The 5th stage sampling fractions (selection probability for individual (s,k,c,h,d,i)) are thus:

Sample Allocation
In this section the calculations needed for allocating the Gaza household sample among the various sample units are described. The aggregate overall inclusion probability for an arbitrary household (s,k,c,h,d) is obtained by multiplying the various selection probabilities at each of the first four sampling stages:

As can be seen on the right hand side of (2.8), this probability is independent of the household index d, implying all households within the same housing unit have equal inclusion probabilities. For the samples of males and females, which are derived directly from the household sample, the probabilities of inclusion are obtained by multiplying the household probabilities by the 5th stage sampling fractions.

In (2.8) the statistics K(s), k(s) and B(s,k) are known. The statistic D(s,k,c,h) is observed from the sample, and H(s,k,c) is estimated from sample data by (2.4). Thus it remains to determine the b(s,k)s and the d(s,k,c)s.

Allocation of Sample of Cells - b(s,k)
The number of cells to be selected from the various sample PSUs, the b(s,k)'s, are determined as follows: The 1st stage sampling fractions, P1 (s, k), are already fixed (Table A.4). The 2nd stage fractions,
P2 (s, k, c), are determined so that:

C1 is a constant for all PSU (s,k)s. Formula (2.9) implies the design for sampling of cells be an epsem one (equal probability selection method for all strata and PSUs).

On the average per sample cell 10 households were to be selected. Having a total Gaza sample size of 960 households, the number of cells to be selected at the 2nd stage was thus 96, i.e. the sum of the b(s,k)s over all sample PSUs amounts to 96. (Due to numerical approximations the actual calculations implied 97 cells to be selected and the total household sample size to be 964).
The expression (2.9) can be rearranged:

Except for the constant C1, all the statistics on the right hand side of (2.10) are known. C1 is determined by taking the sum of all (sample) (s,k) of both sides of (2.10):

On the left hand side the sum is 96, while the sum on the right hand side amounts to C1 multiplied by some (known) factor. Hence, C1 is fixed, and the number of sample cells within each of the sample PSUs, the b(s,k)s, is determined from (2.10) by insertion of the respective numbers.

Allocation of Sample of Housing units and Households -d(s,k,c)
As the cell total number of housing units, H(s,k,c), was unknown at the stage of sample allocation, the housing unit sample size for each of the cells had to be determined indirectly by using the information on the cell total number of households, D(s,k,c). Thus a distinction has to be made between the allocation task and the calculation of inclusion probabilities.

In order to determine the number of housing units to be selected from each sample cell, d(s,k,c), we would require the sample size to be proportionate to the total number of households, D(s,k,c), i.e.

Here, C2 would be constant. By rearranging (2.11) we would get:

Taking the sum of all sample cells (s,k,c) of both sides of (2.12), the left hand side adds up to 960, while the right hand side adds up to some known multiple of C2:

Thus, C2 is determined, and the cell sample size of households is finally calculated by formula (2.12). This way of allocating is what would have been done if households could be selected directly without the intermediate stage of housing unit selection. In this case the household sample would have been an epsem one. However, the introduction of the housing unit stage makes application of an epsem design for household selection practically impossible.

Table A.5 shows the number of population and sample cells for each of the sample PSUs. The aggregate PSU household sample size, i.e. the sum of sample households over all sample cells within each sample PSU, is also displayed.

Table A.5 Population and sample number of cells, and aggregate household sample size (d(s,k)) for each of the sample PSUs. Gaza Strip

Number of cells

No. Name of sample PSU Population Sample PSU household sample size

s (s,k) B (s,k) b (s,k) d (s,k)

1 Zaitoun N 105 25 4 29

Shajaeya N 106 49 9 114

2 Rimal N 104 20 7 44

3 West Khan Yunis S 109 17 4 52

Rafah Town S 110 31 7 97

4 Shati Camp N 103 73 17 120

5 Bureij Camp S 107 25 12 122

6 Rafah Camp S 111 68 16 154

7 Jabalia Village N 102 40 14 157

Beit Lahia N 101 11 4 40

Qararah S 108 8 3 35

TOTAL 367 97 964

al@mashriq                       960428/960710