Question 1
Suppose that a researcher, using data on class size (clsize) and test scores (score) from thirdgrade students, wishes to estimate the model
$latex
score = β_0 + β_1 clsize + u.
$
(i) What kinds of factors are contained in u? Are these likely to be correlated with class size?
 (Answer): if the teaching quality of a teacher, name it , is higher, we can expect that score of student could also be higher. And usually, if the school chose to allocate more student to better teachers' class, then is positively correlated to .
(ii) To estimate the parameter (and ), the researcher prefers to use the OLS estimator (and ) because they are unbiased. What does it mean by saying that “the OLS estimator ” is unbiased? Under what kind of situation, does have bias?
 (Answer):
 it means that the estimator is calculated under SLR. 14.
 when the contains factors that are correlated to , it will make and violate SLR.4, which means has bias.
(iii) Based on 100 observations, the researcher estimated the model by the OLS
$latex
\widehat{score} = 520.4 − 5.82 clsize \\
n = 100,\ \ \ \ R^2 = 0.88
$
Student A is in a classroom with a total of 19 students whereas student B is in a classroom with a total of 23 students. What is the regression’s prediction for the diﬀerence between students A and B test scores?
 (Answer):
 A: plug into the OLS, we can have
 B: plug into the OLS, we can have
(iv) Suppose we transform the data on score to score/100 . How will the above OLS estimates and change?
 (Answer):
 By plug in , would be ,
 will not change, becuase , where only y is related in .
(v) Now suppose we transform score to log(score). Explain how will the interpretation on the coefficient of clsize change.
 (Answer): 1 unit of clsize increase change will cause 5.82% of decrase. BUG???
Question 2
The data on working men was used to estimate the following equation:
$latex
\widehat{educ} = 10.36 − .094 sibs + .131 meduc + .210 feduc \\
n = 722, R^2 = 0.214.
$
where educ is years of schooling, sibs is number of siblings, meduc is mother’s years of schooling, and feduc is father’s years of schooling.
(i) Does sibs have the expected effect? Explain. Holding meduc and feduc ﬁxed, by how much does sibs have to increase to reduce predicted years of education by one year? (A noninteger answer is acceptable here.)
 (Answer):
 holding other factors fixed, every one unit increase in will can cause .094 year decrease in , thus it needs increase in sibs
(ii) Discuss the interpretation of the coefficient on meduc.
 (Answer):
 Holding fixed, every 1 unit increase of , will increase year of education on working men.
(iii) Suppose that Man A has no siblings, and his mother and father each have 16 years of education. Man B has no siblings, and his mother and father each have 20 years of education. What is the predicted difference in years of education between B and A?
 (Answer):
 for men A, by plug in :
 for men B, by plug in :
 Thus, the difference is men B has year more education than men A.
 for men A, by plug in :
(iv) Some researcher suggested to include parents’ total years of schooling as an additional regressor to the above model. Explain whether this suggestion makes sense.
 (Answer):
 the new regressor has perfect collinearity with , which will violate MLR.3 "No perfect collinearity".
(v) Now consider the following models estimated by the OLS:
$latex
\widehat{educ} = \hat{β}_0 + \hat{β_1} sibs + \hat{β_2} peduc,\\
\widetilde{educ} = \tilde{β}_0 + \tilde{β_1} sibs.
$
Suppose sibs and peduc are negatively correlated. Compare the slope estimates and .
 (Answer):
 if are negatively correlated, we have . Becuase , thus is negative. Thus
(vi) If the researcher wishes to choose one of the models in (v), what would you recommend? Do you recommend to compare ? Or something else?
* (Answer):
* Because and are correlated, if omit , the will pickup the effect of , causing over estimated. And it is also called Omitted Variable Bias.
* I would recommend adjusted Rsquare to measure the goodnessoffit. Becuase adjusted Rsquare could reflect the degree of freedom of regression.
Question 3
In this question, we consider testing the rationality of assessments of housing prices. In the simple regression model
$latex
price = β_0 + β_1 assess + u,
$
the assessment is rational if and . The estimated equation is
$latex
\widehat{price} = \underset{(16.27)}{−14.47} + \underset{(.049)}.976 assess \\
n = 88,\ SSR = 165,000,\ R^2 = .820.
$
(i) First, test the hypothesis that against the twosided alternative. Then, test against the twosided alternative. What do you conclude?
 (Asnwer):
 For , with and significant level, considering the test is two sided, we can get critical value from tstudent table. while is , and , thus we failed to reject .
 For , the is , and . Thus we failed to reject
(ii) To test the joint hypothesis that and , we need the SSR in the restricted P model. This amounts to computing , where n = 88, since the residuals in the restricted model are just . (No estimation is needed for the restricted model because both parameters are speciﬁed under .) This turns out to yield SSR = 210, 000. Carry out the F test for the joint hypothesis.

(Asnwer):

By definition,
$latex
F=\frac{\frac{(SSR_r – SSR_{ur})}{q}}{\frac{SSR_{ur}}{nk1}}
$ 
with , and , then

with 1% significant level and q = 2 and df = 90, we can obtain that the critical value , where . Thus is rejected.

(iii) Now, test in the model
$latex
price = β_0 + β_1 assess + β_2 lotsize + β_3 sqrft + β_4 bdrms + u.
$
The Rsquared from estimating this model using the same 88 houses is .829.
 (answer):
 By definition, we have:
with , q=3, nk1=83, we have
We fail to reject because for 10% significant level with 3 and 83 df, the critical value is 2.15.
 By definition, we have:
(iv) If the variance of price changes with assess, lotsize, sqrft, or bdrms, what can you say about the F test from part (iii)?
 (Answer):
 Assumption MLR.5 would be violated because of changing variance. The heteroscedasticity would lead to no F distribution under the null hypothesis. Thus the F test would be meaningless.
(v) Some researcher complained that the test result in (iii) is invalid (i.e., the type I error probability is not controlled at the declared signiﬁcance level) because the model has too large correlations among regressors lotsize, sqrft, and bdrms. Respond to this comment.
 (Answer):
 The correlations among regressors do not affect the test result because this is a joint hypotheses test. We focus on whether these three factors together are jointly statistically significant. Compared with that, the correlation among them is not that important.
Question 4
To test the eﬀectiveness of a job training program on the subsequent wages of workers, we specify the model
$latex
wage = β_0 + β_1 train + β_2 educ + β_3 exper + u, (0.1)
$
where train is a binary variable equal to unity if a worker participated in the program.
(i) Think of the error term as containing unobserved worker ability. Iﬀ less able workers have a greater chance of being selected for the program, and you use an OLS analysis, what can you say about the likely bias in the OLS estimator of ?
(ii) Suppose Assumptions MLR.15 hold true. But we suspect Assumption MLR.6 ( is normally distributed) is violated. Explain how to test the hypothesis against .
(iii) Suppose the variances are diﬀerent depending on the status of training:
$latex
Var(wagetrain = 0, educ, exper) \ne Var(wagetrain = 1, educ, exper).
$
Is the result in (ii) still reliable? If not, how would you modify the result?
(iv) Suppose it is known that
$latex
Var(utrain, educ, exper) = σ^2 educ,
$
where is some unknown constant. Can you suggest a better estimator than the OLS? Why is it better?
(v) Now consider the model
$latex
wage = β_0 + β_1 train + β_2 educ + β_3 exper + β_4 train \dot exper + u.
$
What is the interpretation of the coefficient ? Explain.
(vi) Suppose we want to test whether the eﬀect of the job training is different between male and female. How would you respecify the model in (0.1) by using the binary variable female? (female equals to unity if female, and zero if male.) Also write the hypothesis to test.