Help - Search - Members - Calendar
Full Version: What's In The Brin That Links May Character?
> Wikimedia Discussion > Meta Discussion
Jon Awbrey
I've been meaning to discuss this in detail some day …

QUOTE

Sergey Brin and Lawrence Page : The Anatomy of a Large-Scale Hypertextual Web Search Engine

2.1.2. Intuitive Justification

PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page. One important variation is to only add the damping factor d to a single page, or a group of pages. This allows for personalization and can make it nearly impossible to deliberately mislead the system in order to get a higher ranking. We have several other extensions to PageRank, again see (Page, 1998).

Another intuitive justification is that a page can have a high PageRank if there are many pages that point to it, or if there are some pages that point to it and have a high PageRank. Intuitively, pages that are well cited from many places around the web are worth looking at. Also, pages that have perhaps only one citation from something like the Yahoo! homepage are also generally worth looking at. If a page was not high quality, or was a broken link, it is quite likely that Yahoo's homepage would not link to it. PageRank handles both these cases and everything in between by recursively propagating weights through the link structure of the web.


Jes not today …

Ja³
GlassBeadGame
QUOTE(Jon Awbrey @ Wed 28th January 2009, 10:24am) *

I've been meaning to discuss this in detail some day …

QUOTE

Sergey Brin and Lawrence Page : The Anatomy of a Large-Scale Hypertextual Web Search Engine

2.1.2. Intuitive Justification

PageRank can be thought of as a model of user behavior. We assume there is a "random surfer" who is given a web page at random and keeps clicking on links, never hitting "back" but eventually gets bored and starts on another random page. The probability that the random surfer visits a page is its PageRank. And, the d damping factor is the probability at each page the "random surfer" will get bored and request another random page. One important variation is to only add the damping factor d to a single page, or a group of pages. This allows for personalization and can make it nearly impossible to deliberately mislead the system in order to get a higher ranking. We have several other extensions to PageRank, again see (Page, 1998).

Another intuitive justification is that a page can have a high PageRank if there are many pages that point to it, or if there are some pages that point to it and have a high PageRank. Intuitively, pages that are well cited from many places around the web are worth looking at. Also, pages that have perhaps only one citation from something like the Yahoo! homepage are also generally worth looking at. If a page was not high quality, or was a broken link, it is quite likely that Yahoo's homepage would not link to it. PageRank handles both these cases and everything in between by recursively propagating weights through the link structure of the web.


Jes not today …

Ja³


I'm trying to understand the "damping factor" this seems to kind of threshold before clicks are counted, say to ones own page to increase rank, but I'm not sure. Does this subtract multiple hits from one IP? Or something else? Also the "random suffer" probably described someone prior to the popularity of search engines who no longer exists.
Moulton
QUOTE(GlassBeadGame @ Wed 28th January 2009, 10:51am) *
I'm trying to understand the "damping factor" this seems to kind of threshold before clicks are counted, say to ones own page to increase rank, but I'm not sure. Does this subtract multiple hits from one IP? Or something else? Also the "random suffer" probably described someone prior to the popularity of search engines who no longer exists.

Say you log in fresh for the day, totally bored out of your mind, having grown tired of whatever subject you were drilling down before you fell asleep at the keyboard the night before.

You surf randomly for a while, and eventually land on something that intrigues you. You click, read, and drill down until you have had your fill.

What's the proportion of links you click on while drilling down, compared to the proportion of pages you visit "fresh" (i.e. via an initial Google search driven by your own internal agenda).

The "damping factor" corresponds to that drill-down proportion.
GlassBeadGame
QUOTE(Moulton @ Wed 28th January 2009, 11:01am) *

QUOTE(GlassBeadGame @ Wed 28th January 2009, 10:51am) *
I'm trying to understand the "damping factor" this seems to kind of threshold before clicks are counted, say to ones own page to increase rank, but I'm not sure. Does this subtract multiple hits from one IP? Or something else? Also the "random suffer" probably described someone prior to the popularity of search engines who no longer exists.

Say you log in fresh for the day, totally bored out of your mind, having grown tired of whatever subject you were drilling down before you fell asleep at the keyboard the night before.

You surf randomly for a while, and eventually land on something that intrigues you. You click, read, and drill down until you have had your fill.

What's the proportion of links you click on while drilling down, compared to the proportion of pages you visit "fresh" (i.e. via an initial Google search driven by your own internal agenda).

The "damping factor" corresponds to that drill-down proportion.


Thanks for that Moulton. It would seem to me length of visit and possibly "link away and return" would be important in calculating "drill down." They would signal a degree of interest. Of course I might go to a three paragraph page and walk away for twenty minutes.

Today web use seems to me much more purpose driven. What they describe seems more like using some Gopher to navigate Usenet. They really have changed things profoundly.
Jon Awbrey
If you really want a taste of just how rank Giggle's PageRank can be, try a search on Ampheck …

Ja³
EricBarbour
I've got something even ranker:

Sergey has claimed since 1998 that he read all these books while in high school and Stanford.

Possible, though I have my doubts.

(Also, can someone explain why Stanford keeps his personal website up?
Bragging?)
Jon Awbrey
QUOTE(EricBarbour @ Thu 29th January 2009, 5:06am) *

I've got something even ranker:

Sergey has claimed since 1998 that he read all these books while in high school and Stanford.

Possible, though I have my doubts.

(Also, can someone explain why Stanford keeps his personal website up? Bragging?)


A bit OCR (Obsessive Compulsive Reader), but not otherwise unusual.

I'm guessing that Stanford & Sons keeps a lot of old junk around as a kind of technophile* chronofile.

Folks who want a taste of what a link matrix looks like on a smaller scale might have a look at this page-ranking application that someone created for the entries in PlanetMath.

Myyn.Org's PlanetMath Browser

Ja³

* No, it's not what some of you are thinking.
GlassBeadGame
QUOTE(Jon Awbrey @ Thu 29th January 2009, 6:54am) *

QUOTE(EricBarbour @ Thu 29th January 2009, 5:06am) *

I've got something even ranker:

Sergey has claimed since 1998 that he read all these books while in high school and Stanford.

Possible, though I have my doubts.

(Also, can someone explain why Stanford keeps his personal website up? Bragging?)


A bit OCR (Obsessive Compulsive Reader), but not otherwise unusual.

I'm guessing that Stanford & Sons keeps a lot of old junk around as a kind of technophile* chronofile.

Folks who want a taste of what a link matrix looks like on a smaller scale might have a look at this page-ranking application that someone created for the entries in PlanetMath.

Myyn.Org's PlanetMath Browser

Ja³

* No, it's not what some of you are thinking.


Note that Brin had read both Shakespeare's Othello as well as Othello, The Moor of Venice. Also he read both Vonnegut's Slaughterhouse Five as well as Slaughterhouse-Five. Of course these are just his favorite books there where probably hundreds of books with slight variations of the title of the same work that he read but didn't list. It could be he showed a precocious ability for generating lists with no concern for underlying meaning from an early age.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.