We have a question regarding the cluster-robust estimation of standard errors in a (1) simple random sample design as opposed to a (2) cluster sampling design.


Assume for (1) a population of 10,000 people that lives in 20 equal-sized villages with 500 inhabitants each. We draw a simple random sample of 600 people from the total population.

After looking at the sample data, we find that observations are from all 20 villages and that errors are clustered on the village level.

Question 1: Is the conventional vce(cluster clustvar) command adequate to estimate the cluster-robust standard error in this case? We ask because we presume to know that the total number of clusters is not large, which is the conventional assumption, and because we have observations from all clusters.
Question 2: If vce(cluster clustvar) is not correct, which STATA-command should we use?

Assume for (2) again the same population of 10,000 people who are living in in 20 equal-sized villages with 500 inhabitants each. To save money, we now use a cluster sampling design (two-stage cluster sampling) and first randomly select 10 villages (psus) and then randomly draw a sample of 60 inhabitants (ssus) in each of the 10 selected villages. Hence, sample size is again 600. After looking at the data, we find again that errors are clustered on the village level.

Question 3: Which STATA command should we use for this two-stage cluster sampling design?
Question 4: Of course, there is svy, but wouldn?t we need a finite population correction (or even two) because we sampled a large fraction of psus (50%) and a large fraction of ssus (12%) within the selected psus?