The present day “publish or perish” madness in academia involves counting number of papers researchers have published, as well as the number of citations their papers got.
(a) One might argue that not all citations are equally valuable: a citation in a paper that is itself often cited is more valuable than a citation in a paper that no one cites. Design a xPageRank style algorithm which would rank papers according to their “importance”, and then use such an algorithm to rank researchers by their “importance”.
(b) Assume now that you do not have information on the citations in each published paper, but instead you have for every researcher a list of other researchers who have cited him and how many times they cited him. Design again a PageRank style algorithm which would rank researchers by their importance.
we can view this problem as problem, where each paper is viewed as webpage, and citation is viewed as hyperlink, and the paper rank result is viewed as webpage rank result. Thus, we would have a system of equations, one for each paper P in the paper library:
it means that the importance of paper P is dependent on the importance score of the papers that cited P.
and we use matrix to represent the citation relationship such that:
There might be dangling node paper which reference no other one, we can do to make them pointing every one else to resolve the dangling problem. Besides, when a reader is diving in one paper, he might have a chance to jump to another paper randomly rather than following the citation list, we denote it as , finally the Google-Matrix like paper citation matrix can be written as:
in the process of iteratively computing paper rank, we use vector to represent the intermediate result after round . So we keep running following iterative equation iteratively
until the converge. And then, the converged vector contains corresponding importance score for each paper.
Finally, the importance score of researcher equals to sum of all his(her) own papers importance score, i.e
In this problem, it still can be modeled as PageRank problem like problem-(a), but it needs a little modification on how we construct citation matrix.
Firstly, Since we know how a researcher received citations from other researchers like , we could only construct a receiving matrix rather than citation matrix. Then we do a transpose on it, it will becomes a citation matrix again.
Secondly, When we are calculating adjacency relation , instead of assign it to be (a.k.a ), we make it equals to , in which the times of node j pointing to i means that researcher i having received citations from a specific researcher j for k times. And total number of i was pointed to means how many citations has received in total, and we denote it with
From analysis, we can have equation
Using this equation, we can construct the researcher citation receiving matrix , where stands for researcher receive the weighted importance score from . Then we do a transpose to it, it turns into a citation matrix again, a.k.a , which will work with Google matrix construction equation again. we denote it with
Similarly, to tackle dangling node and randomization problem, we use following equation to construct Google-Matrix like researcher citation relation matrix with random walk probability
Then, we use to denotes the importance vector of researchers, and run iteratively on following equation
until it converges, then the contains corresponding importance score for each researcher, and the increasing rank of importance scores is the rank of researcher.