Class 5: Analyzing the Data (Part 2)

In this class we will learn how to make high-quality plots and regression analysis in Stata. In both cases, Stata offers a huge variety of options, and therefore this class does not pretend to be an exhaustive revision of all the cool plots and regressions models in Stata. The goal is to show the basic logic behind making plots and running regressions in Stata, and show you to export the results of your regressions to Word or Latex in beautiful tables you will be proud of including in your work. We will use the same database we used in class 4 to show you how plots and regressions work.

You will asked to solve a long exercise that puts into practice the skills you have acquired in this course. You will be provided with data on expenditures from the Chilean central government, and analyze the problem of year-end expenditure spikes.

Plots

Please download the materials for this class here and open the plots.do file. Carefully read the examples that explain the basic syntaxis of plots.

Regressions

Open the regressions.do file. Carefully read the examples that explain the basic syntaxis of regressions and how to save your results.

Exercise: Characterizing Year-end Spikes in Governments' Expenditures

You are a data analyst at the Fiscal Affairs Department (FAD) of the International Monetary Fund. The FAD is very interested in the results from a recent paper by professors Jeffrey Liebman (Harvard Kennedy School of Government) and Neale Mahoney (Chicago Booth School of Business), that proposes an explanation for the widespread phenomena known as year-end spikes in expenditure. The issue at hand is that public agencies across the world tend to spend a disproportionate share of their budget at the very end of the year. The standard explanation for such a behavior puts the blame on expiring budget, that is, the rule that you loose the monies you don't spend. However, to a first approximation expiring budgets only explain why organizations exhaust their budgets, but no why they have a spike at the end of the year. Professors Liebman and Mahoney offer an explanation for why expiring budgets might lead to a spike in expenditure at the end of the year: bureaucrats face uncertainty in the value of public expenditure in the future, so they refrain from spending at the beginning of the year to build a buffer against potential emergencies (rainy day savings). But emergencies only rarely occur, so most of the times bureaucrats find themselves at the end of the year sitting over a pile of moneys they didn't expend. That is why they rush to spend at the end of the year.

You are part of a team of young professionals that are working on characterizing the problem of year-end spikes in governments' expenditure and conducting an early exploratory empirical analysis of the plausibility of professors Lieabman and Mahoney's theory. You have been commissioned to work on the case of Chile, for which you have been given access to data available in the folder for this class.

Please follow the following road map to complete this assignment:

Step 1: Building the Database

The Chilean office of budget planning gave you access to the monthly expenditures of all public agencies of the Chilean Central Government between 2011 and 2013. The data is in the data/original folder, and it consists in one csv file for each public agency (there are 198 agencies). You have to create a unique Stata database from all these files, with each row in the database representing an agencies' accrued expenditure in a particular year, month and budget subtitle (these are categories within the budget, as for example expenditure in personnel, expenditure in goods and services, etc). Don't worry, this will not be tedious with the right tools! Open the build-database-template.do file in the scripts folder and follow the instructions.

Step 2: Creating Unique ID's for Agencies and Subtitles

Now we have a unique database with all the information we need, well done! We want to generate numeric ID's for each agency and each subtitle. This values are part of two string variables, the file variable created when generating the database in the previous step, and the subtitle variable. Open the create-ids-template.do file in the scripts folder and follow the instructions to extract the ID's from these string variables.

Step 3: Characterizing Year-end Spikes With Plots

This is where the fun begins! You should write a code that opens the database you have generated and produces the following four figures. You can find inspiration in the file plots.do. A brief comment on how to manipulate the data to create the figure and why the figure is relevant for the problem precedes each figure.

  1. Fraction of budget accrued each month, by year: This graph shows that the phenomena we are interested is real: December stands as the month with the largest share of expenditure every year. To produce this graph, you should first add the accrued expenditures and the budget by year and month.

  2. Fraction of budget accrued each month, by subtitle: This graph shows that the year-end spike is not equally strong across subtitles. To produce this graph, you should first add the accrued expenditures and the budget by subtitle and month.

  3. Ranking ministries by year-end spike: This graph shows that the year-end spike is not equally strong across ministries. To produce this graph, you should keep only observations from December, and add the accrued expenditures and the budget by agency ID. Then you can calculate the fraction of the budget accrued in December.

  4. Relation between the standard deviation of accrued expenditures between January and November and the fraction of the budget accrued in December: This graph constitutes a first graphical exploration of professors Liebman and Mahoney's theory. We will build a simple variable that will serve as a proxy for the uncertainty faced by each agency, following the logic that agencies that face a greater uncertainty should display more variability in their monthly expenditures. The graph will show how this variable relates to the fraction of the budget accrued in December.

    Follow the following steps to produce the database you will use for this graph:

    1. Collapse the database, summing the accrued expenditures and the budget by month and agency id.

    2. For each month between January and November, divide that month's accrued expenditure by the total accrued expenditures between January and November.

    3. Calculate the accrued expenditure over the budget, only for December

    4. Collapse the database by agency, calculating the standard deviation of the variable created in part 2 and the mean of the variable created in part 3. These variables will serve, respectively, as the variables for the x and y axis of the plot.

Step 4: An Exploratory Analysis of Professors Liebman and Mahoney's Theory

We will run some regressions to explore the empirical validity of Liebman and Mahoney's theory. First, we will build the data needed to run the regressions. Once we have that data, we will run some regressions.

Building the data needed for the regressions

The last plot of the graphical analysis suggests that there is a relation between the standard deviation of the fraction of accrued expenditure between January and November and the fraction of the budget accrued in December. As argued above, the value of this standard deviation may be interpreted as a proxy for the degree of uncertainty in the social value of expenditure. If we knew no other factors influence the value of this standard deviation, we could safely conclude that the empirical evidence is consistent with the theory. However, other things might be influencing the value of this standard deviation as well. For instance, the second plot of the graphical analysis shows that some subtitles of the budget are more volatile. If we do not incorporate the share of an agency's budget assigned to each subtitle as an independent variable in the regression, we might conclude that greater uncertainty is associated with a larger year-end spike, when in fact the association is spurious and due to the fact that some subtitles are more volatile and have a larger year-end spike, and that agencies' differ in the composition of their budget.

The dataset we need for the regression has one observation per agency. The dependent variable is the fraction of the budget accrued in December (across all years in the sample). The independent variables are:

Please open the create-data-for-regressions-template.do file to build the required database. After creating the database, create a do file that produces summary table of the variables of interest (include the mean, standard deviation, minimum value, median, and maximum value). Run two regressions, one where you only include the standard deviation of accrued expenditures as a share of total accrued expenditures between January and November as an independent variable, and another one that also include the share of the budget assigned to each subtitle as independent variables. Save the results of your regressions to a word document using outreg2.

Please note that this is an extremely simple analysis, and does not constitute a serious test for the validity of the theory.

You can download the solutions to these exercises here (please try to solve the exercises by yourself before looking at the solutions).