The model the activity of the random Web surfer, the PageRank algorithm represents the link structure of the Web as a directed graph.
$$ H = \begin{bmatrix}
0.0 & 1.0 & 0.0 & 0.0 \\
0.0 & 0.0 & 1.0 & 0.0 \\
0.5 & 0.0 & 0.0 & 0.5 \\
0.0 & 0.0 & 0.0 & 0.0 \end{bmatrix} $$
If webpage i has \(l_i\) \(\geq\) 1 links to other webpages and webpage i links to webpage j, then the element in row i and column j of H is \(H_{ij}\) = \(\frac{1}{l_i}\). Otherwise, \(H_{ij}\) = 0.
If there is no any links to other webpages, then it means that it is a dangling node. The resulting matrix is S = H + dw, where d is a column vector that identifies dangling nodes, meaning \(d_{j}\) = 1 if \(l_{j}\) = 0 and \(d_{i}\) = 0, otherwise; and w = \{\(w_{1}\) \(w_{2}\) . . . \(w_{n}\)\} is a row vector with \(w_{j}\) \(\geq\) 0 for all 1 \(\leq\) j \(\leq\) n and \(\sum\limits_{j=1}^n w_{j} = 1\). There exists some options to fix this problem. One of them is set the probability of the web pages to 1/n. The most popular choice for w is the uniform row vector
$$ H = \begin{bmatrix} 0.25 & 0.25 & 0.25 & 0.25 \end{bmatrix} $$
$$ S = \begin{bmatrix}
0.0 & 1.0 & 0.0 & 0.0 \\
0.0 & 0.0 & 1.0 & 0.0 \\
0.5 & 0.0 & 0.0 & 0.5 \\
0.0 & 0.0 & 0.0 & 0.0 \end{bmatrix} +
\begin{bmatrix} 0.0 \\ 0.0 \\ 0.0 \\ 1.0 \end{bmatrix}
\begin{pmatrix} 0.25 & 0.25 & 0.25 & 0.25 \end{pmatrix}
$$
$$ S = \begin{bmatrix}
0.0 & 1.0 & 0.0 & 0.0 \\
0.0 & 0.0 & 1.0 & 0.0 \\
0.5 & 0.0 & 0.0 & 0.5 \\
0.25 & 0.25 & 0.25 & 0.25 \end{bmatrix}
$$
To model the overall behaviour of a random web surfer, Google forms the matrix G = \(\alpha\)S + (1 - \(\alpha\))1v, where 0 \(\leq\) \(\alpha\) \(<\) 1 is a scalar, 1 is the column vector of ones, and v is a row probability distribution vector called the personalization vector. The damping factor(\(\alpha\)) in the Google matrix indicates that random Web surfers move to a different webpage by some means other than selecting a link with probability 1 - \(\alpha\).
$$ G = \alpha S + (1 - \alpha) 1v $$
$$
G = 0.85 \begin{bmatrix}
0.0 & 1.0 & 0.0 & 0.0 \\
0.0 & 0.0 & 1.0 & 0.0 \\
0.5 & 0.0 & 0.0 & 0.5 \\
0.25 & 0.25 & 0.25 & 0.25 \end{bmatrix} +
(1 - 0.85)
\begin{bmatrix} 1.0 \\ 1.0 \\ 1.0 \\ 1.0 \end{bmatrix}
\begin{pmatrix} 0.25 & 0.25 & 0.25 & 0.25 \end{pmatrix}
$$
$$
G = \begin{bmatrix}
3/80 & 71/80 & 3/80 & 3/80 \\
3/80 & 3/80 & 71/80 & 3/80 \\
37/80 & 3/80 & 37/80 & 3/80 \\
0.25 & 0.25 & 0.25 & 0.25 \end{bmatrix}
$$
To compute the PageRank score of each webpages. We should find exact solutions to the eigensystem, \(\pi\)G = \(\pi\). Given a starting vector \(\pi^{(0)}\), e.g. \(\pi^{(0)}\) = v, the power method calculates successive iterates
$$ \pi^{(2)} = \pi^{(1)}G,\ where\ k = 1 $$$$ \pi^{(3)} = \pi^{(2)}G,\ where\ k = 2 $$$$ \vdots $$$$ \pi^{(k + 1)} = \pi^{(k)}G,\ where\ k = 1, 2, ... $$
Until some convergence criterion is satisfied.(We keep going until the sequence settles down, i.e. until \(\pi^{(k+1)}\) is nearly equal to \(\pi^{(k)}\)G) You will obtain the PageRank vector.
\noindent\textbf{2. Explain how the eigenvalue problem is solved. You don’t need to rewrite what is in the paper, just put down what you understand.}
As it has been explained above, to compute the PageRank score of each webpages. We should find exact solutions to the eigensystem, \(\pi\)G = \(\pi\). A stationary probability vector \(\pi\) is defined as a vector that does not change under application of the transition matrix; that is, it is defined as a left eigenvector of G matrix, associated with eigenvalue 1. We use the power method to calculate the eigenvector. Given a starting vector \(\pi^{(0)}\), e.g. \(\pi^{(0)}\) = v, the power method calculates successive iterates
$$ \pi^{(2)} = \pi^{(1)}G,\ where\ k = 1 $$$$ \pi^{(3)} = \pi^{(2)}G,\ where\ k = 2 $$$$ \vdots $$$$ \pi^{(k + 1)} = \pi^{(k)}G,\ where\ k = 1, 2, ... $$
Until convergence criterion is satisfied, you will obtain the PageRank vector.(We keep going until the sequence settles down, i.e. until \(\pi^{(k+1)}\) is nearly equal to \(\pi^{(k)}\)G) Thus G has a unique eigenvector with eigenvalue 1. We find that eigenvector, normalize it so that the sum of the entries is 1. Then we use this vector to rank the webpages: the page i with the largest number is ranked first, and so on.
$$\pi^{(0)} = \begin{bmatrix} 0.250 & 0.250 & 0.250 & 0.250 \end{bmatrix} $$$$\pi^{(1)} = \begin{bmatrix} 0.196 & 0.303 & 0.303 & 0.196 \end{bmatrix} $$$$\pi^{(2)} = \begin{bmatrix} 0.208 & 0.246 & 0.337 & 0.208 \end{bmatrix} $$$$\pi^{(3)} = \begin{bmatrix} 0.225 & 0.258 & 0.291 & 0.225 \end{bmatrix} $$$$\vdots$$$$\pi^{(28)} = \begin{bmatrix} 0.213 & 0.264 & 0.307 & 0.213 \end{bmatrix} $$$$\pi^{(29)} = \begin{bmatrix} 0.213 & 0.264 & 0.307 & 0.213 \end{bmatrix} $$
$$Normalized\ vector = \begin{bmatrix} 3 & 2 & 1 & 3 \end{bmatrix} $$
I made some calculation on MATLAB and after 29 iteration, now we can normalize the \(\pi^{(29)}\) vector to obtain the PageRank.
\noindent\textbf{3. Starting with a 5x5 Markov matrix, apply the algorithm described in Section 5 of the paper to find the eigenvector associated with eigenvalue 1.}
Let's assume that, \(\alpha\) = 0.85 and we have a web structure something like this;
$$ H = \begin{bmatrix}
0.0 & 0.5 & 0.0 & 0.0 & 0.5 \\
0.0 & 0.0 & 1.0 & 0.0 & 0.0 \\
0.5 & 0.0 & 0.0 & 0.5 & 0.0 \\
0.25 & 0.25 & 0.25 & 0.0 & 0.25 \\
0.0 & 0.0 & 0.0 & 0.0 & 0.0 \end{bmatrix}
v = \begin{bmatrix} 0.20 \\ 0.20 \\ 0.20 \\ 0.20 \\ 0.20 \end{bmatrix}
$$
As you see, there is a problem in that matrix, there is no any links to other webpages for last row, then it means that the graph contains a dangling node. To handle it, we should set the probability of the web pages to 1/n.(The most popular choice) The resulting matrix will be S = H + dw.
$$ S = \begin{bmatrix}
0.0 & 0.5 & 0.0 & 0.0 & 0.5 \\
0.0 & 0.0 & 1.0 & 0.0 & 0.0 \\
0.5 & 0.0 & 0.0 & 0.5 & 0.0 \\
0.25 & 0.25 & 0.25 & 0.0 & 0.25 \\
0.0 & 0.0 & 0.0 & 0.0 & 0.0 \end{bmatrix} +
\begin{bmatrix} 0.0 \\ 0.0 \\ 0.0 \\ 0.0 \\ 1.0 \end{bmatrix}
\begin{pmatrix} 0.20 & 0.20 & 0.20 & 0.20 & 0.20 \end{pmatrix}
$$
$$ S = \begin{bmatrix}
0.0 & 0.5 & 0.0 & 0.0 & 0.5 \\
0.0 & 0.0 & 1.0 & 0.0 & 0.0 \\
0.5 & 0.0 & 0.0 & 0.5 & 0.0 \\
0.25 & 0.25 & 0.25 & 0.0 & 0.25 \\
0.20 & 0.20 & 0.20 & 0.20 & 0.20 \end{bmatrix}
v = \begin{bmatrix} 0.20 \\ 0.20 \\ 0.20 \\ 0.20 \\ 0.20 \end{bmatrix}
$$
Even just multiplying a huge vector by a huge matrix could take a while. But for the particular matrix G, we can find Ge much more easily than it might first appear. Note that
$$ G = \alpha S + (1 - \alpha) 1v $$$$ G = 0.85 \begin{bmatrix}
0.0 & 0.5 & 0.0 & 0.0 & 0.5 \\
0.0 & 0.0 & 1.0 & 0.0 & 0.0 \\
0.5 & 0.0 & 0.0 & 0.5 & 0.0 \\
0.25 & 0.25 & 0.25 & 0.0 & 0.25 \\
0.20 & 0.20 & 0.20 & 0.20 & 0.20 \end{bmatrix} +
(1 - 0.85)
\begin{bmatrix} 1.0 \\ 1.0 \\ 1.0 \\ 1.0 \\ 1.0 \end{bmatrix}
\begin{pmatrix} 0.2 & 0.2 & 0.2 & 0.2 & 0.2 \end{pmatrix}
$$
$$
G = \begin{bmatrix}
0 & 0.425 & 0 & 0 & 0.425 \\
0 & 0 & 0.850 & 0 & 0 \\
0.425 & 0 & 0 & 0.425 & 0 \\
0.212 & 0.212 & 0.212 & 0 & 0.212 \\
0.170 & 0.170 & 0.170 & 0.170 & 0.170 \end{bmatrix} +
0.15
\begin{bmatrix}
0.20 & 0.20 & 0.20 & 0.20 & 0.20 \\
0.20 & 0.20 & 0.20 & 0.20 & 0.20 \\
0.20 & 0.20 & 0.20 & 0.20 & 0.20 \\
0.20 & 0.20 & 0.20 & 0.20 & 0.20 \\
0.20 & 0.20 & 0.20 & 0.20 & 0.20 \end{bmatrix}
$$
$$
G = \begin{bmatrix}
0 & 0.425 & 0 & 0 & 0.425 \\
0 & 0 & 0.850 & 0 & 0 \\
0.425 & 0 & 0 & 0.425 & 0 \\
0.212 & 0.212 & 0.212 & 0 & 0.212 \\
0.170 & 0.170 & 0.170 & 0.170 & 0.170 \end{bmatrix} +
\begin{bmatrix}
0.03 & 0.03 & 0.03 & 0.03 & 0.03 \\
0.03 & 0.03 & 0.03 & 0.03 & 0.03 \\
0.03 & 0.03 & 0.03 & 0.03 & 0.03 \\
0.03 & 0.03 & 0.03 & 0.03 & 0.03 \\
0.03 & 0.03 & 0.03 & 0.03 & 0.03 \end{bmatrix}
$$
$$
G = \begin{bmatrix}
0.030 & 0.455 & 0.030 & 0.030 & 0.455 \\
0.030 & 0.030 & 0.880 & 0.030 & 0.030 \\
0.455 & 0.030 & 0.030 & 0.455 & 0.030 \\
0.242 & 0.242 & 0.242 & 0.030 & 0.242 \\
0.200 & 0.200 & 0.200 & 0.200 & 0.200 \end{bmatrix}
$$
We use the power method to calculate the eigenvector. Given a starting vector \(\pi^{(0)}\), e.g. \(\pi^{(0)}\) = v, the power method calculates successive iterates
$$\pi^{(0)} = \begin{bmatrix} 0.200 & 0.200 & 0.200 & 0.200 & 0.200 \end{bmatrix} $$$$\pi^{(1)} = \begin{bmatrix} 0.191 & 0.191 & 0.276 & 0.149 & 0.191 \end{bmatrix} $$$$\pi^{(2)} = \begin{bmatrix} 0.211 & 0.175 & 0.257 & 0.180 & 0.175 \end{bmatrix} $$$$\pi^{(3)} = \begin{bmatrix} 0.207 & 0.188 & 0.247 & 0.169 & 0.188 \end{bmatrix} $$$$\vdots$$$$\pi^{(10)} = \begin{bmatrix} 0.205 & 0.185 & 0.254 & 0.169 & 0.185 \end{bmatrix} $$$$\pi^{(11)} = \begin{bmatrix} 0.205 & 0.184 & 0.254 & 0.169 & 0.184 \end{bmatrix} $$$$Normalized\ vector = \begin{bmatrix} 2 & 3 & 1 & 4 & 3 \end{bmatrix} $$
All these calculations made by MATLAB.
REFERENCES
1 - www.abcseo.com/seo-book/toolbar-google.htm, Google Toobar PageRank.
2 - www.google.com/corporate/history.html, Google Corp. Information: Google Milestones.
No comments:
Post a Comment