Stata MPA-ID Tutorial

Welcome to the Stata MPA-ID Tutorial!

The goal of this short course is to teach you how to make reproducible research using Stata. The course consists in five classes that build on top of each other, from the very basics to advanced topics. The approach of the course is 100% practical, which means you will be working with real world data on Stata from the very beginning.

In each class, we will complete brief projects that will introduce you to different techniques in Stata. After the class, you will be asked to complete a project by yourself to cement the methods we have learned.

Why Stata?

At this point, you may be wondering why out of all the different statistical softwares available you should learn Stata. Throughout the last decades, Stata has become a shared standard among economists and public policy expert to produce and reproduce empirical inquiries related to academic and policy challenges. As such, feeling comfortable with Stata is an important part of the toolbox you will acquire throughout your program to become a public policy expert.

You might also have heard of R or Python, which are programming languages with which you can conduct your empirical research, and whose use is becoming more and more common among scholars and practitioners. If you are exited about learning how to use them, Stata constitutes an excellent first step toward that goal. Stata is more user friendly, but shares a lot of features that are general to other programming languages. So, after mastering Stata, learning R or Python will be much easier!

This Course

This website is here to provide you a place to come back and review the key concepts of the class, but most of the work will be done in Stata. In-class exercises and homeworks are embedded on this website. Please provide your name below to personalize the instructions of exercises and homeworks (your will only need to enter your name once):

Class 1: From Excel to Stata

This class will introduce you to Stata from ground zero. We will open a database and learn how to interact with it using the Stata command line. We will conclude introducing the use of do files.

By the end of the class you will know:

  • What are the main windows of Stata and what are they used for.
  • How do you give instructions to Stata to obtain useful information about a database.
  • How to use a log file to save the results of your analysis.
  • How to use a do file to make your work reproducible.

We will learn these skills by analyzing socioeconomic disparities in the exposition to and prompt reparation of gas leaks in the public streets of Boston and Cambridge, combining data reported by the utility companies and data form the 2010 Census.

Class 2: Manipulating the Data

This class will introduce you to the commands you need to get from a raw to a clean database, ready for the analysis.

By the end of the class you will know:

  • How to combine databases horizontally (merge) and vertically (append).
  • How to manipulate string variables.
  • How to apply complex transformations to your variables.
  • How to work with longitudinal data (units through time).

We will learn these skills by analyzing data on prompt payment for hospitals in Chile.

Class 3: Wrapping Up What We Have Learned so Far

In this class you will work on a project that requires the skills we have learned so far. The project consists in analyzing how transition from and to poverty has evolved in the US since the late 1970's using the NLS79 survey.

Class 4: Analyzing the Data (Part 1)

During this class and the next, you will learn what you need to perform an efficient analysis of the data in Stata.

By the end of the class you will know:

  • How to download World Bank data directly from Stata.
  • How to manipulate your data as a pro, using loops, global variables, local variables, temporal variables, temp files, and conditional execution.
  • How to store values on matrices for later use.
  • How to conduct basic statistical tests (t-test).

We will learn these skills by analyzing the main drivers of CO2 emissions across the world using the World Bank open database.

Class 5: Analyzing the Data (Part 2)

This class starts by finishing what was left from class 4: graphical and regression analysis. Then, we will conclude with a long exercise to put into practice what you have learned in this course.

By the end of the class you will know:

  • How to make high-quality plots.
  • How to make high-quality descriptive tables.
  • How to run regressions.
  • How to export your results from regressions to Word or Latex.

We will learn these skills by continuing the analysis of the World Bank open data. We will also have a long exercise to capitulate the course, in which you will analyze data on budget execution of the chilean central government.