Class 3: Wrapping Up What We Have Learned so Far

In this class you will solve five short exercises that apply most of the commands we have learned so far. I do not expect you to finish all the exercises during this class, but try to advance as much as possible. The homework for this class is to finish the exercises you did not complete during the class.

Set up

You are starting your first day at the World Bank after completing your graduate studies. Your first assignment is to analyze poverty dynamics in the US since the late 1980's, for which you are given access to a project folder you can download here.

After being ushered to your office, you notice you already have a email from your boss in your inbox:

Hi Colluegue:

Welcome on board! You have been commissioned to work on an analysis about poverty dynamics in the US since the late 1970's.

Tomorrow I will introduce you to the rest of the team in our weekly meeting. It would be great if you could have some early results to share with the rest of the team by then. Using the data from the NLSY79 survey available in the project's folder, you should calculate the poverty rate, the probability that a poor person in year t-1 moves out of poverty in year t (conditional probability of moving out of poverty), and the probability than a not poor person moves into poverty in year t (conditional probability of falling into poverty).

Besides general trends, we would also like to have the results by groups defined by gender and race (hispanic males, hispanic females, black males, black females, non-hispanic non-black males, non-hispanic non-black females), as well as by groups defined by educational achievement (high school dropouts, high school graduates, incomplete college, and college graduates).

Thank you, see you tomorrow!

Exercise 1: Importing and saving the data

In your project folder, all the data you need is in the data/original directory. This data has been downloaded from the NLSY79 webpage, and comes in a Stata-friendly format. You should write a do file that imports each data and label de variables by running the do file that does this job in the folder of each database. Take a look at one of these do files to see how do you label variables and variables' values. Save each database in your data/processed directory. Inspect the databases and ask your instructor if you have any doubt.

Tips for exercise 1:

Exercise 2: Calculating the general trend

In this part you have to write a do file that calculates the poverty rate and both conditional probabilities (out of poverty and into poverty) for each year in the sample. The output of the do file should be a .dta file and an excel sheet file that shows fo each year what is the faction of poor, the probability than a non-poor falls into poverty, and the probability than a poor gets out of poverty.

Tips for exercise 2:

Exercise 3: Calculating trends for groups defined by race and gender

This exercise asks you to repeat what you did for exercise 2, but now the results should be calculated for each group defined by race and the gender. The output of the do file should be a .dta file and an excel sheet that shows in separate columns the results for each group.

Tips for exercise 3:

Exercise 4: Calculating trends for groups defined by educational achievement

This exercise is similar to the previous exercise, but the difference is that the educational achievement variable require some additional adjustments. The output of your do file is analogous to the required in the last exercise.

Tips for exercise 4:

Exercise 5: Comparing the general trend between the NLSY79 and the NLSY97 samples

Good job! You arrive to your meeting with all the information that was required and gave an excellent first impression. During the meeting, a former classmate from your graduate program raised a smart concern about your analysis: does this numbers reflect a general trend in the US economy or the life-cycle of the NLSY79 generation? To partially address this point, you said you would run the same analysis with the NLSY97 sample and compare how different are the results for the same years in which both samples overlap. If they are very similar, it is probably the case that your analysis is reflecting the general trend in the US economy. If they are very different, and they are actually similar to the figures that the NLSY79 generation had when they were young, it is likely that your analysis is reflecting the dynamics of poverty throughout the lifecycle.

Write a do file that calculates the general trend for the NYLS97 survey (download the data here) and merge this result with the results you calculated for the NLSY79 survey.

Tips for exercise 5:

You can download the solutions to these exercises here (please try to solve the exercises by yourself before looking at the solutions).