EC212: WEEKEND PROBLEM SET

Question 1

Suppose that a researcher, using data on class size (clsize) and test scores (score) from third-grade students, wishes to estimate the model

$latex
score = β_0 + β_1 clsize + u.
$

(i) What kinds of factors are contained in u? Are these likely to be correlated with class size?

  • (Answer): if the teaching quality of a teacher, name it teachq , is higher, we can expect that score of student could also be higher. And usually, if the school chose to allocate more student to better teachers' class, then teachq is positively correlated to clsize .

(ii) To estimate the parameter β_1 (and β_0 ), the researcher prefers to use the OLS estimator β_1 (and β_0 ) because they are unbiased. What does it mean by saying that “the OLS estimator β_1 ” is unbiased? Under what kind of situation, does β_1 have bias?

  • (Answer):
    • it means that the estimator \hat{\beta}_1 is calculated under SLR. 1-4.
    • when the u contains factors that are correlated to \hat{\beta_1} , it will make E(u|x) \ne 0 and violate SLR.4, which means \hat{\beta}_1 has bias.

(iii) Based on 100 observations, the researcher estimated the model by the OLS

$latex
\widehat{score} = 520.4 − 5.82 clsize \\
n = 100,\ \ \ \ R^2 = 0.88
$

Student A is in a classroom with a total of 19 students whereas student B is in a classroom with a total of 23 students. What is the regression’s prediction for the difference between students A and B test scores?

  • (Answer):
    • A: plug clsize=19 into the OLS, we can have \widehat{score} = 520.4 − 5.82 * 19  \approx 409.820\\
    • B: plug clsize=23 into the OLS, we can have \widehat{score} = 520.4 − 5.82 * 23  \approx 386.540\\

(iv) Suppose we transform the data on score to score/100 . How will the above OLS estimates and R^2 change?

  • (Answer):
    • By plug in score = 100*newscore , \beta_1 would be 100\beta_1 ,
    • R^2 will not change, becuase R^2=\frac{SSE}{SST}=\frac{\sum_i^n(\hat{y_i} - \bar{y})}{\sum_i^n(y_i - \bar{y})} , where only y is related in R^2 .

(v) Now suppose we transform score to log(score). Explain how will the interpretation on the coefficient of clsize change.

  • (Answer): 1 unit of clsize increase change will cause 5.82% of decrase. BUG???

Question 2

The data on working men was used to estimate the following equation:

$latex
\widehat{educ} = 10.36 − .094 sibs + .131 meduc + .210 feduc \\
n = 722, R^2 = 0.214.
$

where educ is years of schooling, sibs is number of siblings, meduc is mother’s years of schooling, and feduc is father’s years of schooling.

(i) Does sibs have the expected effect? Explain. Holding meduc and feduc fixed, by how much does sibs have to increase to reduce predicted years of education by one year? (A noninteger answer is acceptable here.)

  • (Answer):
    • holding other factors fixed, every one unit increase in sibs will can cause .094 year decrease in educ , thus it needs \frac{1}{.094} \approx 10.638 increase in sibs

(ii) Discuss the interpretation of the coefficient on meduc.

  • (Answer):
    • Holding sibs, feduc fixed, every 1 unit increase of medu , will increase 0.131 year of education on working men.

(iii) Suppose that Man A has no siblings, and his mother and father each have 16 years of education. Man B has no siblings, and his mother and father each have 20 years of education. What is the predicted difference in years of education between B and A?

  • (Answer):
    • for men A, by plug in sibs=0, medu=16, feduc=16 :
      \widehat{educ}_A = 10.36 - .094*0 + .131*16  + .210*16 \approx 15.816\\
    • for men B, by plug in sibs=0, medu=20, feduc=20 :
      \widehat{educ}_B = 10.36 - .094*0 + .131*20  + .210*20 \approx 17.180\\
    • Thus, the difference is men B has 17.180-15.816=1.364 year more education than men A.

(iv) Some researcher suggested to include parents’ total years of schooling peduc = meduc + feduc as an additional regressor to the above model. Explain whether this suggestion makes sense.

  • (Answer):
    • the new regressor pedu has perfect collinearity with medu, fedu , which will violate MLR.3 "No perfect collinearity".

(v) Now consider the following models estimated by the OLS:

$latex
\widehat{educ} = \hat{β}_0 + \hat{β_1} sibs + \hat{β_2} peduc,\\
\widetilde{educ} = \tilde{β}_0 + \tilde{β_1} sibs.
$

Suppose sibs and peduc are negatively correlated. Compare the slope estimates \hat{β}_1 and \tilde{β}_1 .

  • (Answer):
    • if sibs, peduc are negatively correlated, we have \tilde{\beta}_1 = \hat{\beta}_1 + \hat{\beta_2} * \tilde{\delta_1} . Becuase \hat{\beta_2} > 0 , thus \tilde{\delta} is negative. Thus \hat{\beta_1} \lt \tilde{\beta}_1

(vi) If the researcher wishes to choose one of the models in (v), what would you recommend? Do you recommend to compare R^2 ? Or something else?
* (Answer):
* Because pedu and sibs are correlated, if omit pedu , the sibs will pickup the effect of pedu , causing sibs over estimated. And it is also called Omitted Variable Bias.
* I would recommend adjusted R-square to measure the goodness-of-fit. Becuase adjusted R-square could reflect the degree of freedom of regression.

Question 3

In this question, we consider testing the rationality of assessments of housing prices. In the simple regression model
$latex
price = β_0 + β_1 assess + u,
$

the assessment is rational if β_1 = 1 and β_0 = 0 . The estimated equation is

$latex
\widehat{price} = \underset{(16.27)}{−14.47} + \underset{(.049)}.976 assess \\
n = 88,\ SSR = 165,000,\ R^2 = .820.
$

(i) First, test the hypothesis that H_0 : β_0 = 0 against the two-sided alternative. Then, test H_0 : β_1 = 1 against the two-sided alternative. What do you conclude?

  • (Asnwer):
    • For H_0:\beta_0 = 0 , with df=n-k-1=86 and 5\% significant level, considering the test is two sided, we can get critical value c=1.987 from t-student table. while t_{\hat{\beta}_0} is \frac{\hat{\beta}_0}{se(\hat{\beta}_0)} = (-14.47)/(16.27) \approx -0.89  , and |-0.89| < 1.987 , thus we failed to reject H_0 .
    • For H_0: \beta_1=1 , the t_{\hat{\beta}_1} is \frac{(\hat{\beta}_1-1)}{se(\hat{\beta}_1)} = (0.976-1)/.049 = -0.49 , and |-0.49| \lt 1.987 . Thus we failed to reject H_0: \beta_1=1

(ii) To test the joint hypothesis that β_0 = 0 and β_1 = 1 , we need the SSR in the restricted P model. This amounts to computing \sum_{i=1}^n (price_i − assess_i)^2 , where n = 88, since the residuals in the restricted model are just price_i − assess_i . (No estimation is needed for the restricted model because both parameters are specified under H_0 .) This turns out to yield SSR = 210, 000. Carry out the F test for the joint hypothesis.

  • (Asnwer):

    • By definition,
      $latex
      F=\frac{\frac{(SSR_r – SSR_{ur})}{q}}{\frac{SSR_{ur}}{n-k-1}}
      $

    • with q = 2, n-k-1 = 86 , and SSR_r = 210,000\ \ SSR_{ur}=165,000 , then F \approx 11.73

    • with 1% significant level and q = 2 and df = 90, we can obtain that the critical value c=4.85 , where 4.85<11.73 . Thus H_0 is rejected.

(iii) Now, test H_0 : β_2 = 0, β_3 = 0,\ and\ β_4 = 0 in the model
$latex
price = β_0 + β_1 assess + β_2 lotsize + β_3 sqrft + β_4 bdrms + u.
$

The R-squared from estimating this model using the same 88 houses is .829.

  • (answer):
    • By definition, we have:
      F=\frac{\frac{(R_r - R_{ur})}{q}}{\frac{1 - R_{ur}}{n-k-1}}
      with R_r=.829 , q=3, n-k-1=83, we have
      F=\frac     {\frac{(.829 - .820)}{3}}     {\frac{1 - .829}{83}} \approx 1.46
      We fail to reject H_0 because for 10% significant level with 3 and 83 df, the critical value is 2.15.

(iv) If the variance of price changes with assess, lotsize, sqrft, or bdrms, what can you say about the F test from part (iii)?

  • (Answer):
    • Assumption MLR.5 would be violated because of changing variance. The heteroscedasticity would lead to no F distribution under the null hypothesis. Thus the F test would be meaningless.

(v) Some researcher complained that the test result in (iii) is invalid (i.e., the type I error probability is not controlled at the declared significance level) because the model has too large correlations among regressors lotsize, sqrft, and bdrms. Respond to this comment.

  • (Answer):
    • The correlations among regressors lotsize, sqrft, bdrms do not affect the test result because this is a joint hypotheses test. We focus on whether these three factors together are jointly statistically significant. Compared with that, the correlation among them is not that important.

Question 4

To test the effectiveness of a job training program on the subsequent wages of workers, we specify the model
$latex
wage = β_0 + β_1 train + β_2 educ + β_3 exper + u, (0.1)
$
where train is a binary variable equal to unity if a worker participated in the program.

(i) Think of the error term u as containing unobserved worker ability. Iff less able workers have a greater chance of being selected for the program, and you use an OLS analysis, what can you say about the likely bias in the OLS estimator of β_1 ?

(ii) Suppose Assumptions MLR.1-5 hold true. But we suspect Assumption MLR.6 (u is normally distributed) is violated. Explain how to test the hypothesis H_0 : β_1 = 0 against H_1 : β_1 \ne 0 .

(iii) Suppose the variances are different depending on the status of training:
$latex
Var(wage|train = 0, educ, exper) \ne Var(wage|train = 1, educ, exper).
$
Is the result in (ii) still reliable? If not, how would you modify the result?

(iv) Suppose it is known that
$latex
Var(u|train, educ, exper) = σ^2 educ,
$
where σ^2 is some unknown constant. Can you suggest a better estimator than the OLS? Why is it better?

(v) Now consider the model
$latex
wage = β_0 + β_1 train + β_2 educ + β_3 exper + β_4 train \dot exper + u.
$

What is the interpretation of the coefficient β_4 ? Explain.

(vi) Suppose we want to test whether the effect of the job training is different between male and female. How would you re-specify the model in (0.1) by using the binary variable female? (female equals to unity if female, and zero if male.) Also write the hypothesis to test.

Posted in TechNotes | Tagged | Leave a comment

A PageRank View of Research Rank

The present day “publish or perish” madness in academia involves counting number of papers researchers have published, as well as the number of citations their papers got.

(a) One might argue that not all citations are equally valuable: a citation in a paper that is itself often cited is more valuable than a citation in a paper that no one cites. Design a xPageRank style algorithm which would rank papers according to their “importance”, and then use such an algorithm to rank researchers by their “importance”.

(b) Assume now that you do not have information on the citations in each published paper, but instead you have for every researcher a list of other researchers who have cited him and how many times they cited him. Design again a PageRank style algorithm which would rank researchers by their importance.

Solution-(a):

we can view this problem as PageRank problem, where each paper is viewed as webpage, and citation is viewed as hyperlink, and the paper rank result is viewed as webpage rank result. Thus, we would have a system of equations, one for each paper P in the paper library:

\left\{ \rho(P) = \sum_{P_i \rightarrow P}{\frac{\rho(P_i)}{\#(P_i)}} \right\}_{P \ \in \ \text{Paper-Library}}

it means that the importance of paper P is dependent on the importance score of the papers that cited P.

and we use matrix C_1 to represent the citation relationship such that:

c(i,j) = \left\{  \frac{1}{\#(P_i)} \space If\ paper\ P_i\ cited\ P_j  \atop  0 \space \text{otherwise}  \right.

There might be dangling node paper which reference no other one, we can do C_1 + \frac{1}{M}\mathbf{de^T} to make them pointing every one else to resolve the dangling problem. Besides, when a reader is diving in one paper, he might have a chance to jump to another paper randomly rather than following the citation list, we denote it as \theta , finally the Google-Matrix like paper citation matrix can be written as:

\mathbb{C} = \theta\left( C_1 + \frac{1}{M}\mathbf{de^{T}} \right) + \frac{1 - \theta}{M}\mathbf{ee^T}

in the process of iteratively computing paper rank, we use vector \pi^*_i to represent the intermediate result after round i_{th} . So we keep running following iterative equation iteratively

{(\pi^*_{i+1})}^T = {(\pi^*_{i})}^T\mathbb{C}

until the \pi^* converge. And then, the converged vector contains corresponding importance score for each paper.

Finally, the importance score of researcher R_i equals to sum of all his(her) own papers importance score, i.e

R_i = \sum_{if\ R_i\ own\ paper\ P_j}\pi[j]

Solution-(b):

Analysis:

In this problem, it still can be modeled as PageRank problem like problem-(a), but it needs a little modification on how we construct citation matrix.

Firstly, Since we know how a researcher R_i received citations from other researchers like R_j , we could only construct a receiving matrix rather than citation matrix. Then we do a transpose on it, it will becomes a citation matrix again.

Secondly, When we are calculating adjacency relation r(i,j) , instead of assign it to be \left\{\frac{1}{\text{numbers of node i points out}}\right\} (a.k.a \frac{1}{\#(P_i)} ), we make it equals to \left\{\frac{\text{how many times of node j points to i}}{\text{total number of i was pointed to}}\right\} , in which the times of node j pointing to i means that researcher i having received citations from a specific researcher j for k times. And total number of i was pointed to means how many citations R_i has received in total, and we denote it with \#(R_i)

Procedure:

From analysis, we can have equation

r(i,j) = \left\{  \frac{k}{\#(R_i)} \space If\ researcher\ R_i\ received\ \ citations\ from\ R_j\space for \space k \space times  \atop  0 \space \text{otherwise}  \right.

Using this equation, we can construct the researcher citation receiving matrix \mathbb{R} , where \mathbb{R}[i][j] stands for researcher R_i receive the weighted importance score from R_j . Then we do a transpose to it, it turns into a citation matrix again, a.k.a R^T , which will work with Google matrix construction equation again. we denote it with C_1

Similarly, to tackle dangling node and randomization problem, we use following equation to construct Google-Matrix like researcher citation relation matrix \mathbb{C} with random walk probability

\theta :

\mathbb{C} = \theta\left( C_1 + \frac{1}{M}\mathbf{de^{T}} \right) + \frac{1 - \theta}{M}\mathbf{ee^T}

Then, we use \pi_i^* to denotes the importance vector of researchers, and run iteratively on following equation

(\pi_{i+1}^*)^T = (\pi_i^*)^T \mathbb{C}

until it converges, then the \pi^* contains corresponding importance score for each researcher, and the increasing rank of importance scores is the rank of researcher.

Posted in TechNotes | Leave a comment

A Simple Perspective on PageRank

Below is given the graph of the Internet one milisecond after the Big Bang. Construct the corresponding Google matrix with α = 7/8.

Figure\ 1:\ The\ Internet\ 1ms\ after\ the\ Big\ Bang

Solution:

With equation

g(i,j) = \left\{  \frac{1}{\#(P_i)} \space if \space P_i \rightarrow P_j  \atop  0 \space \text{otherwise}  \right.

we can obtain the topology matrix G_1 :

G_1 =  \begin{matrix}  0 & 1/4 & 1/4 & 1/4 & 0 & 0 & 0 & 1/4 \\  1/4 & 0 & 1/4 & 0 & 1/4 & 0 & 1/4 & 0 \\  0 & 1/3 & 0 & 1/3 & 0 & 0 & 1/3 & 0 \\  0 & 0 & 0 & 0 & 1/2 & 1/2 & 0 & 0 \\  0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\  1/3 & 0 & 1/3 & 0 & 0 & 0 & 1/3 & 0 \\  0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\  1/3 & 0 & 1/3 & 0 & 1/3 & 0 & 0 & 0  \end{matrix}

we found that it has dangling node, so we need to eliminate it by make it connects to every node, thus G_1 + \frac{1}{M}\mathbf{d}e^{T} . In addition, a random walk with \alpha = 7/8 should also be taken into consideration, as a result, we would have the Google matrix equation:

7/8\left( G_1 + \frac{1}{M}\mathbf{de^{T}} \right) + \frac{1 - 7/8}{M}\mathbf{ee^T}

Finally, the Google matrix is calculated to be:

G =  \begin{matrix}  1/64 & 15/64 & 15/64 & 15/64 & 1/64 & 1/64 & 1/64 & 15/64 \\  15/64 & 1/64 & 15/64 & 1/64 & 15/64 & 1/64 & 15/64 & 1/64 \\  1/64 & 59/192 & 1/64 & 59/192 & 1/64 & 1/64 & 59/192 & 1/64 \\  1/64 & 1/64 & 1/64 & 1/64 & 29/64 & 29/64 & 1/64 & 1/64 \\  1/8 & 1/8 & 1/8 & 1/8 & 1/8 & 1/8 & 1/8 & 1/8 \\  59/192 & 1/64 & 59/192 & 1/64 & 1/64 & 1/64 & 59/192 & 1/64 \\  1/8 & 1/8 & 1/8 & 1/8 & 1/8 & 1/8 & 1/8 & 1/8 \\  59/192 & 1/64 & 59/192 & 1/64 & 59/192 & 1/64 & 1/64 & 1/64  \end{matrix}

In case there is any ambiguity, python style pseudo code is provided.

def constructGoogleMatrix(adjacency, alpha):
    """
    :type   adjacency: Dict{node,Set{node}}
    :rtype  G: List[List[float]]
    """

    G = [[0]*len(adjacency) for i in range(len(adjacency))]

    for node in adjacency:
        if len(adjacency[node]) > 0:
            weight = 1 / len(adjacency[node])
            for neighbour in adjacency[node]:
                G[node.num][neighbour.num] = weight*alpha + (1-alpha)/len(adjacency)
        else: # dangling node
            G[node.num][0:] = [(1/len(adjacency))*alpha + (1-alpha)/len(adjacency)] * len(G[node.num])
    return G
Posted in TechNotes | Tagged , , | Leave a comment