Hi everyone,

I'm working on a project for my economic development course and am having difficulty determining what code to use to run these computations. The dataset that I am working with looks at a deworming program that was implemented in Kenya during 1998 and 1999. There are three different computations that I want to do and am unsure how to approach it in terms of what code to use.

I have the following variables in my dataset:
  1. Pupid: pupil index number.
  2. pill98: dummy variable for if the child took the deworming pill in 1998.
  3. pill99: dummy variable for if the child took the deworming pill in 1999.
  4. grade98: the pupil's grade in 1998.
  5. sex: dummy variable, 1 for male and 0 for female.
  6. old_girl98: girls greater or equal to the age of 13 in 1998.
  7. totpar98: average school participation in 1998.
  8. wgrp: assigned worm group (1, 2 or 3) in 1998.
  9. treat_sch98: represents a school that was assigned to the deworming program in 1998.
  10. infect_early99: moderate-heavy worm infection in early 1999.
1) First I want to compute the difference in differences through a regression looking at the pre- and post-treatment attendance records for all schools. The issue here is I am not too sure how to approach this regression and how I can effectively compute the difference in differences.

2) Next there are a few computations that I want to make and am not fully confident on the code:
  1. I want to see how many observations there are per pupil and I'm assuming all I need to do is enter
    Code:
    su pupid
    to summarize the variable and see how many observations there are for it.
  2. I want to determine the percentage of pupils that are boys. I believe that what I want to do is either drop the dummy variable observations that are equal to zero and then divide the remaining observations by pupid. I'm just not sure how to drop the dummy observations equal to 0. If there is an alternative method to computing this I'm more than happy to discuss them as well.
  3. I want to see what percentage of schools in 1998 were selected as part of the deworming program. Although this is relatively straightforward, I'm not sure how I can compute this with the given variables.
3) Finally, I want to calculate a series of different outcomes (i.e. treatment effect) based on the data I have. I want to compute the following:
  1. Student's participation in school conditional on if they took deworming pills in 1998--i.e. the difference between students' school participation with and without the deworming pill.
  2. Students in treatment schools versus non-treatment schools in 1998.
  3. The difference in the probability of taking the pill given that a student was in a treatment school and the probability of taking it if a student was not in a treatment school.
  4. The Wald Estimator.
Being completely honest here, I am not very well versed in Stata and often have difficulty doing these types of computations given my limited exposure and any help from the forum would be much appreciated.

Thanks!