D:/github/course-emiii-accompany/IV-wage-mroz
├── code-mroz.R
├── mroz-var-label.txt
├── workingpaper.html
├── workingpaper.qmd
└── workingpaper_files
├── figure-docx
└── figure-html
IV Application (mroz)
Mother and father as IVs for education
Case Description
Research Interests:
With data set
wooldridge::mroz
, researchers were interest in the return (log(Wage)
) to education (edu
) for married women.use both
or/and as instruments for .
Reproducible Sources
Wooldridge, J.M. Introductory econometrics: a modern approach[M]. Seventh edition. Australia: Cengage, 2020.
Hill C, Griffiths W E, Lim G C. Principles of econometrics[M]. Fifth edition. NJ: John Wiley & Sons, 2018.
Colonescu C. Principles of Econometrics with R (2016) https://bookdown.org/ccolonescu/RPoE4/
Learning Targets
Understand the nature of
Endogeneity
.Know the steps of running TSLS method.
Be familiar with R package function
systemfit::systemfit()
andARE::ivreg()
.Testing Instrument validity (Weak instrument) both using Restricted F-test and J-test.
Testing Regressor endogeneity by using Hausman test.
Exercise Materials
You can find all the exercise materials in this project under the file directory:
OLS Estimation
Consider the following “error specified” wage model:
Two-stage least squares(TSLS): the solutions
We can conduct the TSLS procedure with following two solutions:
use the “Step-by-Step solution” methods without variance correction.
use the “Integrated solution” with variance correction.
By doing the Step-by-Step solution, we will understand the basic procedure of Two-stage least squares(TSLS). But DO NOT use Step-by-Step solution solution in your paper! It is only for teaching purpose here.
We need a Integrated solution for following reasons:
We should obtain the correct estimated error for test and inference.
We should avoid tedious steps in the former Step-by-Step routine. When the model contains more than one endogenous regressors and there are lots available instruments, then the step-by-step solution will get extremely tedious.
In R
ecosystem, we have two packages to execute the integrated solution:
We can use
systemfit
package functionsystemfit::systemfit()
.Or we may use
ARE
package functionARE::ivreg()
.
Both of these tools can conduct the integrated solution, and will adjust the variance of estimators automatically.
TWLS (Step-by-step solution): mothereduc
as IV
For the Step-by-step solution, let’s consider using mother education(mothereduc
) as instrument variable for education(educ
).
we can obtain the fitted variable
In the second stage, we will regress log(wage) on the exper
)and its quadratic term (expersq
).
TWLS (Integrated solution): mothereduc
as IV
Let’s consider using
In R
ecosystem, we have two packages to execute the integrated solution:
We can use
systemfit
package functionsystemfit::systemfit()
.Or we may use
ARE
package functionARE::ivreg()
.
Both of these tools can conduct the integrated solution, and will adjust the variance of estimators correctly and automatically.
TWLS (Integrated solution): mothereduc
as IV
Now let’s consider using
We will repeat the whole procedure by using R
function systemfit::systemfit()
or ARE::ivreg()
as we have done before.
TWLS (Integrated solution): mothedu
and fatheduc
as IV
Also, we can use both
We will repeat the whole procedure by using R
function systemfit::systemfit()
or ARE::ivreg()
as we have done before.
Solutions comparison: a glance
Until now, we obtain totally Five estimation results with different model settings or solutions:
Error specification model with OLS regression directly.
(Step-by-Step solution) Explicit 2SLS estimation without variance correction (IV regression step by step with only
as instrument).(Integrated solution) Dedicated IV estimation with variance correction ( using
R
tools ofsystemfit::systemfit()
orARE::ivreg()
).
The IV model with only
as instrument for endogenous variableThe IV model with only
as instrument for endogenous variableThe IV model with both
and as instruments for endogenous variable
After the empirical comparison, we can push to further thinking with these results.
Which estimation is the best?
How to judge and evaluate different instrument choices?
Testing Instrument validity
Consider the general model
is the dependent variable are unknown regression coefficients are endogenous regressors are exogenous regressors which are uncorrelated with is the error term are instrumental variables
As we know, valid instruments should satisfy both Relevance condition and Exogeneity condition.
Weak instrument: Restricted F-test
In case with a single endogenous regressor, we can take the F-test to check the Weak instrument.
The basic idea of the F-test is very simple:
If the estimated coefficients of all instruments in the first-stage of a 2SLS estimation are zero, the instruments do not explain any of the variation in the
We may use the following rule of thumb:
- Conduct the first-stage regression of a 2SLS estimation
Test the restricted joint hypothesis
by compute the -statistic.If the
-statistic is less than critical value, the instruments are weak.
The rule of thumb is easily implemented in R
. Run the first-stage regression using lm()
and subsequently compute the restricted R
function of car::linearHypothesis()
.
For all three IV model, we can test instrument(s) relevance respectively.
Weak instrument: Cragg-Donald test
The former test for weak instruments might be unreliable with more than one endogenous regressor, though, because there is indeed one
An alternative is the Cragg-Donald test based on the following statistic:
where:
is the number of exogenous regressors; is the number of endogenous regressors; is the number of external instruments; is the lowest canonical correlation.
Canonical correlation is a measure of the correlation between the endogenous and the exogenous variables, which can be calculated by the function
cancor()
inR
.
The critical value can be found in table 10E.1 at: Hill C, Griffiths W, Lim G. Principles of econometrics[M]. John Wiley & Sons, 2018.
Instrument Exogeneity: J-test
Instrument Exogeneity means all
In the context of the simple IV estimator, we will find that the exogeneity requirement can not be tested. (Why?)
However, if we have more instruments than we need, we can effectively test whether some of them are uncorrelated with the structural error.
Under over-identification
If instruments are exogenous, the obtained estimates should be similar.
If estimates are very different, some or all instruments may .red[not] be exogenous.
The Overidentifying Restrictions Test (J test) formally check this.
- The null hypothesis is Instrument Exogeneity.
The overidentifying restrictions test (also called the
Procedure of overidentifying restrictions test is:
- Step 1: Compute the IV regression residuals :
- Step 2: Run the auxiliary regression: regress the IV residuals on instruments and exogenous regressors. And test the joint hypothesis
- Step3: Compute the J statistic:
where
is the F-statistic of the restrictions in eq(2)
Under the null hypothesis,
IF
is less than critical value, it means that all instruments are .red[ex]ogenous.
IF
is larger than critical value, it mean that some of the instruments are .red[en]ogenous.
- We can apply the
-test by usingR
functionlinearHypothesis()
.
Again, we can use both
Thus, the IV model is over-identification, and we can test the exogeneity of both these two instruments by using J-test.
The 2SLS model will be set as below.
And the auxiliary regression should be
Finally, We can calculate J-statistic by hand or obtain it by using special tools.
Calculate J-statistic by hand
using tools of
linearHypothesis(., test = "Chisq")
Testing Regressor endogeneity
How can we test the regressor endogeneity?
Since OLS is in general more efficient than IV (recall that if Gauss-Markov assumptions hold OLS is BLUE), we don’t want to use IV when we don’t need to get the consistent estimators.
Of course, if we really want to get a consistent estimator, we also need to check whether the endogenous regressors are really endogenous in the model.
So we should test following hypothesis:
Hausman test
Hausman
tells us that we should use OLS if we fail to reject
Let’s see how to construct a Hausman test
. While the idea is very simple.
If
is .red[ex]ogenous in fact, then both OLS and IV are consistent, but OLS estimates are more efficient than IV estimates.If
is .red[en]dogenous in fact, then the results from OLS estimators are different, while results obtained by IV (eg. 2SLS) are consistent.
We can compare the difference between estimates computed using both OLS and IV.
- If the difference is small, we can conjecture that both OLS and IV are consistent and the small difference between the estimates is not systematic.
- If the difference is large this is due to the fact that OLS estimates are not consistent. We should use IV in this case.
Again, we use both
In R
, we can use IV model diagnose tool to check the Hausman test results.
In fact, R
function summary(lm_iv_mf, diagnostics = TRUE)
by setting diagnostics = TRUE
will give you these results.