The model the activity of the random Web surfer, the PageRank algorithm represents the link structure of the Web as a directed graph.
H = \begin{bmatrix} 0.0 & 1.0 & 0.0 & 0.0 \\ 0.0 & 0.0 & 1.0 & 0.0 \\ 0.5 & 0.0 & 0.0 & 0.5 \\ 0.0 & 0.0 & 0.0 & 0.0 \end{bmatrix}
If webpage i has l_i \geq 1 links to other webpages and webpage i links to webpage j, then the element in row i and column j of H is H_{ij} = \frac{1}{l_i}. Otherwise, H_{ij} = 0.
If there is no any links to other webpages, then it means that it is a dangling node. The resulting matrix is S = H + dw, where d is a column vector that identifies dangling nodes, meaning d_{j} = 1 if l_{j} = 0 and d_{i} = 0, otherwise; and w = \{w_{1} w_{2} . . . w_{n}\} is a row vector with w_{j} \geq 0 for all 1 \leq j \leq n and \sum\limits_{j=1}^n w_{j} = 1. There exists some options to fix this problem. One of them is set the probability of the web pages to 1/n. The most popular choice for w is the uniform row vector
H = \begin{bmatrix} 0.25 & 0.25 & 0.25 & 0.25 \end{bmatrix}
S = \begin{bmatrix} 0.0 & 1.0 & 0.0 & 0.0 \\ 0.0 & 0.0 & 1.0 & 0.0 \\ 0.5 & 0.0 & 0.0 & 0.5 \\ 0.0 & 0.0 & 0.0 & 0.0 \end{bmatrix} + \begin{bmatrix} 0.0 \\ 0.0 \\ 0.0 \\ 1.0 \end{bmatrix} \begin{pmatrix} 0.25 & 0.25 & 0.25 & 0.25 \end{pmatrix}
S = \begin{bmatrix} 0.0 & 1.0 & 0.0 & 0.0 \\ 0.0 & 0.0 & 1.0 & 0.0 \\ 0.5 & 0.0 & 0.0 & 0.5 \\ 0.25 & 0.25 & 0.25 & 0.25 \end{bmatrix}
To model the overall behaviour of a random web surfer, Google forms the matrix G = \alphaS + (1 - \alpha)1v, where 0 \leq \alpha < 1 is a scalar, 1 is the column vector of ones, and v is a row probability distribution vector called the personalization vector. The damping factor(\alpha) in the Google matrix indicates that random Web surfers move to a different webpage by some means other than selecting a link with probability 1 - \alpha.
G = \alpha S + (1 - \alpha) 1v
G = 0.85 \begin{bmatrix} 0.0 & 1.0 & 0.0 & 0.0 \\ 0.0 & 0.0 & 1.0 & 0.0 \\ 0.5 & 0.0 & 0.0 & 0.5 \\ 0.25 & 0.25 & 0.25 & 0.25 \end{bmatrix} + (1 - 0.85) \begin{bmatrix} 1.0 \\ 1.0 \\ 1.0 \\ 1.0 \end{bmatrix} \begin{pmatrix} 0.25 & 0.25 & 0.25 & 0.25 \end{pmatrix}
G = \begin{bmatrix} 3/80 & 71/80 & 3/80 & 3/80 \\ 3/80 & 3/80 & 71/80 & 3/80 \\ 37/80 & 3/80 & 37/80 & 3/80 \\ 0.25 & 0.25 & 0.25 & 0.25 \end{bmatrix}
To compute the PageRank score of each webpages. We should find exact solutions to the eigensystem, \piG = \pi. Given a starting vector \pi^{(0)}, e.g. \pi^{(0)} = v, the power method calculates successive iterates
\pi^{(2)} = \pi^{(1)}G,\ where\ k = 1
\pi^{(3)} = \pi^{(2)}G,\ where\ k = 2
\vdots
\pi^{(k + 1)} = \pi^{(k)}G,\ where\ k = 1, 2, ...
Until some convergence criterion is satisfied.(We keep going until the sequence settles down, i.e. until \pi^{(k+1)} is nearly equal to \pi^{(k)}G) You will obtain the PageRank vector.
\noindent\textbf{2. Explain how the eigenvalue problem is solved. You don’t need to rewrite what is in the paper, just put down what you understand.}
As it has been explained above, to compute the PageRank score of each webpages. We should find exact solutions to the eigensystem, \piG = \pi. A stationary probability vector \pi is defined as a vector that does not change under application of the transition matrix; that is, it is defined as a left eigenvector of G matrix, associated with eigenvalue 1. We use the power method to calculate the eigenvector. Given a starting vector \pi^{(0)}, e.g. \pi^{(0)} = v, the power method calculates successive iterates
\pi^{(2)} = \pi^{(1)}G,\ where\ k = 1
\pi^{(3)} = \pi^{(2)}G,\ where\ k = 2
\vdots
\pi^{(k + 1)} = \pi^{(k)}G,\ where\ k = 1, 2, ...
Until convergence criterion is satisfied, you will obtain the PageRank vector.(We keep going until the sequence settles down, i.e. until \pi^{(k+1)} is nearly equal to \pi^{(k)}G) Thus G has a unique eigenvector with eigenvalue 1. We find that eigenvector, normalize it so that the sum of the entries is 1. Then we use this vector to rank the webpages: the page i with the largest number is ranked first, and so on.
\pi^{(0)} = \begin{bmatrix} 0.250 & 0.250 & 0.250 & 0.250 \end{bmatrix}
\pi^{(1)} = \begin{bmatrix} 0.196 & 0.303 & 0.303 & 0.196 \end{bmatrix}
\pi^{(2)} = \begin{bmatrix} 0.208 & 0.246 & 0.337 & 0.208 \end{bmatrix}
\pi^{(3)} = \begin{bmatrix} 0.225 & 0.258 & 0.291 & 0.225 \end{bmatrix}
\vdots
\pi^{(28)} = \begin{bmatrix} 0.213 & 0.264 & 0.307 & 0.213 \end{bmatrix}
\pi^{(29)} = \begin{bmatrix} 0.213 & 0.264 & 0.307 & 0.213 \end{bmatrix}
Normalized\ vector = \begin{bmatrix} 3 & 2 & 1 & 3 \end{bmatrix}
I made some calculation on MATLAB and after 29 iteration, now we can normalize the \pi^{(29)} vector to obtain the PageRank.
\noindent\textbf{3. Starting with a 5x5 Markov matrix, apply the algorithm described in Section 5 of the paper to find the eigenvector associated with eigenvalue 1.}
Let's assume that, \alpha = 0.85 and we have a web structure something like this;
H = \begin{bmatrix} 0.0 & 0.5 & 0.0 & 0.0 & 0.5 \\ 0.0 & 0.0 & 1.0 & 0.0 & 0.0 \\ 0.5 & 0.0 & 0.0 & 0.5 & 0.0 \\ 0.25 & 0.25 & 0.25 & 0.0 & 0.25 \\ 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \end{bmatrix} v = \begin{bmatrix} 0.20 \\ 0.20 \\ 0.20 \\ 0.20 \\ 0.20 \end{bmatrix}
As you see, there is a problem in that matrix, there is no any links to other webpages for last row, then it means that the graph contains a dangling node. To handle it, we should set the probability of the web pages to 1/n.(The most popular choice) The resulting matrix will be S = H + dw.
S = \begin{bmatrix} 0.0 & 0.5 & 0.0 & 0.0 & 0.5 \\ 0.0 & 0.0 & 1.0 & 0.0 & 0.0 \\ 0.5 & 0.0 & 0.0 & 0.5 & 0.0 \\ 0.25 & 0.25 & 0.25 & 0.0 & 0.25 \\ 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \end{bmatrix} + \begin{bmatrix} 0.0 \\ 0.0 \\ 0.0 \\ 0.0 \\ 1.0 \end{bmatrix} \begin{pmatrix} 0.20 & 0.20 & 0.20 & 0.20 & 0.20 \end{pmatrix}
S = \begin{bmatrix} 0.0 & 0.5 & 0.0 & 0.0 & 0.5 \\ 0.0 & 0.0 & 1.0 & 0.0 & 0.0 \\ 0.5 & 0.0 & 0.0 & 0.5 & 0.0 \\ 0.25 & 0.25 & 0.25 & 0.0 & 0.25 \\ 0.20 & 0.20 & 0.20 & 0.20 & 0.20 \end{bmatrix} v = \begin{bmatrix} 0.20 \\ 0.20 \\ 0.20 \\ 0.20 \\ 0.20 \end{bmatrix}
Even just multiplying a huge vector by a huge matrix could take a while. But for the particular matrix G, we can find Ge much more easily than it might first appear. Note that
G = \alpha S + (1 - \alpha) 1v
G = 0.85 \begin{bmatrix}
0.0 & 0.5 & 0.0 & 0.0 & 0.5 \\
0.0 & 0.0 & 1.0 & 0.0 & 0.0 \\
0.5 & 0.0 & 0.0 & 0.5 & 0.0 \\
0.25 & 0.25 & 0.25 & 0.0 & 0.25 \\
0.20 & 0.20 & 0.20 & 0.20 & 0.20 \end{bmatrix} +
(1 - 0.85)
\begin{bmatrix} 1.0 \\ 1.0 \\ 1.0 \\ 1.0 \\ 1.0 \end{bmatrix}
\begin{pmatrix} 0.2 & 0.2 & 0.2 & 0.2 & 0.2 \end{pmatrix}
G = \begin{bmatrix} 0 & 0.425 & 0 & 0 & 0.425 \\ 0 & 0 & 0.850 & 0 & 0 \\ 0.425 & 0 & 0 & 0.425 & 0 \\ 0.212 & 0.212 & 0.212 & 0 & 0.212 \\ 0.170 & 0.170 & 0.170 & 0.170 & 0.170 \end{bmatrix} + 0.15 \begin{bmatrix} 0.20 & 0.20 & 0.20 & 0.20 & 0.20 \\ 0.20 & 0.20 & 0.20 & 0.20 & 0.20 \\ 0.20 & 0.20 & 0.20 & 0.20 & 0.20 \\ 0.20 & 0.20 & 0.20 & 0.20 & 0.20 \\ 0.20 & 0.20 & 0.20 & 0.20 & 0.20 \end{bmatrix}
G = \begin{bmatrix} 0 & 0.425 & 0 & 0 & 0.425 \\ 0 & 0 & 0.850 & 0 & 0 \\ 0.425 & 0 & 0 & 0.425 & 0 \\ 0.212 & 0.212 & 0.212 & 0 & 0.212 \\ 0.170 & 0.170 & 0.170 & 0.170 & 0.170 \end{bmatrix} + \begin{bmatrix} 0.03 & 0.03 & 0.03 & 0.03 & 0.03 \\ 0.03 & 0.03 & 0.03 & 0.03 & 0.03 \\ 0.03 & 0.03 & 0.03 & 0.03 & 0.03 \\ 0.03 & 0.03 & 0.03 & 0.03 & 0.03 \\ 0.03 & 0.03 & 0.03 & 0.03 & 0.03 \end{bmatrix}
G = \begin{bmatrix} 0.030 & 0.455 & 0.030 & 0.030 & 0.455 \\ 0.030 & 0.030 & 0.880 & 0.030 & 0.030 \\ 0.455 & 0.030 & 0.030 & 0.455 & 0.030 \\ 0.242 & 0.242 & 0.242 & 0.030 & 0.242 \\ 0.200 & 0.200 & 0.200 & 0.200 & 0.200 \end{bmatrix}
We use the power method to calculate the eigenvector. Given a starting vector \pi^{(0)}, e.g. \pi^{(0)} = v, the power method calculates successive iterates
\pi^{(0)} = \begin{bmatrix} 0.200 & 0.200 & 0.200 & 0.200 & 0.200 \end{bmatrix}
\pi^{(1)} = \begin{bmatrix} 0.191 & 0.191 & 0.276 & 0.149 & 0.191 \end{bmatrix}
\pi^{(2)} = \begin{bmatrix} 0.211 & 0.175 & 0.257 & 0.180 & 0.175 \end{bmatrix}
\pi^{(3)} = \begin{bmatrix} 0.207 & 0.188 & 0.247 & 0.169 & 0.188 \end{bmatrix}
\vdots
\pi^{(10)} = \begin{bmatrix} 0.205 & 0.185 & 0.254 & 0.169 & 0.185 \end{bmatrix}
\pi^{(11)} = \begin{bmatrix} 0.205 & 0.184 & 0.254 & 0.169 & 0.184 \end{bmatrix}
Normalized\ vector = \begin{bmatrix} 2 & 3 & 1 & 4 & 3 \end{bmatrix}
All these calculations made by MATLAB.
REFERENCES
1 - www.abcseo.com/seo-book/toolbar-google.htm, Google Toobar PageRank.
2 - www.google.com/corporate/history.html, Google Corp. Information: Google Milestones.
No comments:
Post a Comment