Help - Search - Members - Calendar
Full Version: Google Loves Wikipedia - Even The Empty Pages - Search Newz
> Media Forums > News Worth Discussing
Newsfeed

<img alt="" height="1" width="1">Google Loves Wikipedia - Even The Empty Pages
Search Newz, KY -57 minutes ago
If youre one of those kooky SEOs that doesnt believe that domain authority/trust is the single most important factor in SEO today, I really think you are ...


View the article
dogbiscuit
Summary: a nice analysis about how Google blindly assigns a high ranking to Wikipedia articles even when the articles are stubs or blank and there are other quality articles on the net.
thekohser
QUOTE(dogbiscuit @ Wed 9th July 2008, 1:46pm) *

Summary: a nice analysis about how Google blindly assigns a high ranking to Wikipedia articles even when the articles are stubs or blank and there are other quality articles on the net.


Seems like shoddy research to me.

QUOTE
Compare the Wikipedia page for Lavelanet on Wikipedia with the Page for Lavelanet on Mahalo

The wikipedia page is completely empty, is a pagerank 4, and ranks number 1. The mahalo page which has actual content is nowhere to be found in the SERPs.


Huh, completely empty?

And, the Mahalo page came up #5 in the SERP for me.

The guy's obviously got a legitimate beef about the Google algorithms rewarding Wikipedia, but at least get the evidence correct. Besides, if and when Google launches Knol, I'm sure they'll do a thing or two about how Wikipedia ranks versus similar Knol pages.

Greg
dogbiscuit
QUOTE(thekohser @ Wed 9th July 2008, 8:16pm) *

QUOTE(dogbiscuit @ Wed 9th July 2008, 1:46pm) *

Summary: a nice analysis about how Google blindly assigns a high ranking to Wikipedia articles even when the articles are stubs or blank and there are other quality articles on the net.


Seems like shoddy research to me.


Sorry, went all Wikipedian about checking citations there, didn't I? biggrin.gif
Jon Awbrey
Once again, if you want to see an example where Google promotes Wikipedia, even over sites that have an article where Wikipedia does not, try the following:You get different results depending on the capitalization.

Jon cool.gif
Crestatus
I have seen articles I created just a minute or two before show up on Google search; the last time just today. Too bad Google doesn't do Asian mountains; maybe they'd have found Obama bin Laden by now.
Cock-up-over-conspiracy
I got my 'new page to 3rd position in Google ranking' down to a record 41 seconds.

It could have quicker but I had to open a new tab on my browser.

As any web designer will tell you ... that is somewhere between impossible to unreal for any independents website. Technically impossible even.

There is something wrong with their incestuous relationship ... but, on the other hand, if you are selling your paid services to corporations, it a very, very good statistic to quote.
A User
QUOTE(Cock-up-over-conspiracy @ Wed 24th March 2010, 5:42pm) *

I got my 'new page to 3rd position in Google ranking' down to a record 41 seconds.

It could have quicker but I had to open a new tab on my browser.

As any web designer will tell you ... that is somewhere between impossible to unreal for any independents website. Technically impossible even.

There is something wrong with their incestuous relationship ... but, on the other hand, if you are selling your paid services to corporations, it a very, very good statistic to quote.


It's quite possible that Jimbo did a deal with Google in conjunction with that $2 million donation.
thekohser
QUOTE(Cock-up-over-conspiracy @ Wed 24th March 2010, 2:42am) *

I got my 'new page to 3rd position in Google ranking' down to a record 41 seconds.

It could have quicker but I had to open a new tab on my browser.

As any web designer will tell you ... that is somewhere between impossible to unreal for any independents website. Technically impossible even.

There is something wrong with their incestuous relationship ... but, on the other hand, if you are selling your paid services to corporations, it a very, very good statistic to quote.


A new page on Wikipedia Review can sometimes get into the Google Top 3 in less than an hour. Feel free to try some experimental testing.
RDH(Ghost In The Machine)
A Google of the Siege of Ancona 1815 turns up this masterpiece of WP:MINIMALISM.

As you can see it is not even a stub but an empty article containing the usual needz n0tez0rz bitch tag, the (wrong in this case) stub label and infobox.

In a way it illustrates the essence of Wikipedia and why it, ultimately, FAILS.
Alison
Earlier today, I went looking on WP for an article related to computing; Active State Power Management or ASPM. It's a pretty common term if you're into computer hardware, and I was surprised to see that Wikipedia didn't have an article on ASPM. Either way, it's something I know a bit about, so I hacked an article together just for fun.

Here's the kicker, though; it appeared on Google less than six minutes after I created it! confused.gif It's already in eighth place, beaten out only by Microsoft and Intel hmmm.gif That's pretty shocking.

I know that Kelly Martin once remarked that Google have their own Wikipedia 'recent changes' feed, which is all well and good and you can clearly see it in action here. The thing is, of course, that the article I wrote was benign and somewhat useful to fellow-nerds. But what if it had been some BLP hatchet-job? That would have hit the Google cache pretty-much instantaneously, and got stuck there. Not good ... unhappy.gif
Cock-up-over-conspiracy
QUOTE(Alison @ Tue 30th March 2010, 4:29am) *
I know that Kelly Martin once remarked that Google have their own Wikipedia 'recent changes' feed, which is all well and good and you can clearly see it in action here. The thing is, of course, that the article I wrote was benign and somewhat useful to fellow-nerds. But what if it had been some BLP hatchet-job? That would have hit the Google cache pretty-much instantaneously, and got stuck there. Not good ... unhappy.gif

Agreed ... and that is what happens.

Forgetting new pages for a moment ... I got a single line edit (the addition of an obscure documentary) to list on Google who timed it at one minute.

Like you saw ... that is a security liability. The next question then being ... how long does it stay in Google's cache and then replicated elsewhere?
bambi
QUOTE(Cock-up-over-conspiracy @ Tue 30th March 2010, 9:53am) *

Like you saw ... that is a security liability. The next question then being ... how long does it stay in Google's cache and then replicated elsewhere?

Quick to index, but slow to respond to a deleted article. I timed Google's cache hangover on a deleted Wikipedia article a few weeks ago. It took about two weeks for the cache copy to disappear. Most auto-scrapers wouldn't bother with a deleted article, but it wouldn't be hard to auto-check to see if a cache copy was still available, and use that instead.

The liability here is all on the Wikimedia Foundation (hello, Mike Godwin?). It is trivial to insert a NOARCHIVE meta in the headers of all articles, which would prevent Google from showing cache links. That's why Google doesn't feel any need to address the situation — they've already addressed it, and they are already off the hook.

Alison
QUOTE(bambi @ Tue 30th March 2010, 6:58am) *

The liability here is all on the Wikimedia Foundation (hello, Mike Godwin?). It is trivial to insert a NOARCHIVE meta in the headers of all articles, which would prevent Google from showing cache links. That's why Google doesn't feel any need to address the situation — they've already addressed it, and they are already off the hook.

Is there any reason, Daniel, why this hasn't happened yet? I can't see any advantage to not having it in there and, like you say, adding it to the robots.txt is pretty trivial. Is there some political reason, or Google reason why this hasn't happened yet? We've all seen attack article and other junk get 'stuck' in the cache for weeks on end, so this would seem a priority. Am I missing something here?
Milton Roe
QUOTE(Alison @ Mon 29th March 2010, 9:29pm) *

Here's the kicker, though; it appeared on Google less than six minutes after I created it! confused.gif It's already in eighth place, beaten out only by Microsoft and Intel hmmm.gif That's pretty shocking.

And completely addictive, too, for any writer whose main thrill is to know they are being read (not to be famous as a name, or make money).

I've had the same experience, and commented on it here on WR. But I don't think anybody really heard me. All the things said about MMORPGs are true, but there are other draws on people by WP, as well.

I suppose there's not much that can be said against this, alas for WR. Greg can say "write on MY site!" hrmph.gif mad.gif forever, but until he can get me to Google #8 in six minutes, I'm just not that tempted.
bambi
QUOTE(Alison @ Tue 30th March 2010, 4:51pm) *

Is there any reason, Daniel, why this hasn't happened yet? I can't see any advantage to not having it in there and, like you say, adding it to the robots.txt is pretty trivial. Is there some political reason, or Google reason why this hasn't happened yet? We've all seen attack article and other junk get 'stuck' in the cache for weeks on end, so this would seem a priority. Am I missing something here?

It doesn't work in robots.txt — it has to be in the header of every single page:
CODE

<HTML><HEAD>
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
...
</HEAD>

But that's still trivial, and all major search engines recognize and respect it.

I suspect that Wikipediots fear that Google will be unhappy with Wikipedia if Wikipedia starts doing this on every page. Maybe so, but normally it has zero effect on rankings. If it didn't have zero effect on rankings, one could argue with more conviction that there is no viable opt-out for the cache copy, and therefore make a case that it's a copyright issue because Google has copied and republished the entire page.

I've been doing it on all my sites for nearly ten years, and I don't see any effect on rankings.

Here's a typically uninformed discussion of a failed proposal along these lines on Wikipedia. It's not NOCACHE, for example, it's NOARCHIVE, as someone pointed out after much silly discussion.

And here's an example from 2007 of why you may not want a "Cached" link on Wikipedia articles:


Daddy, daddy, I found George Washington on Wikipedia!

Image
BelovedFox
QUOTE(bambi @ Tue 30th March 2010, 7:35pm) *

QUOTE(Alison @ Tue 30th March 2010, 4:51pm) *

Is there any reason, Daniel, why this hasn't happened yet? I can't see any advantage to not having it in there and, like you say, adding it to the robots.txt is pretty trivial. Is there some political reason, or Google reason why this hasn't happened yet? We've all seen attack article and other junk get 'stuck' in the cache for weeks on end, so this would seem a priority. Am I missing something here?

It doesn't work in robots.txt — it has to be in the header of every single page:
CODE

<HTML><HEAD>
<META NAME="ROBOTS" CONTENT="NOARCHIVE">
...
</HEAD>

But that's still trivial, and all major search engines recognize and respect it.

I suspect that Wikipediots fear that Google will be unhappy with Wikipedia if Wikipedia starts doing this on every page. Maybe so, but normally it has zero effect on rankings. If it didn't have zero effect on rankings, one could argue with more conviction that there is no viable opt-out for the cache copy, and therefore make a case that it's a copyright issue because Google has copied and republished the entire page.

I've been doing it on all my sites for nearly ten years, and I don't see any effect on rankings.

Here's a typically uninformed discussion of a failed proposal along these lines on Wikipedia. It's not NOCACHE, for example, it's NOARCHIVE, as someone pointed out after much silly discussion.

And here's an example from 2007 of why you may not want a "Cached" link on Wikipedia articles:


Daddy, daddy, I found George Washington on Wikipedia!

Image


To be fair, looking at the failed proposal, it seems like there wasn't so much as an RfC held on the subject, it just fizzled out, so it's not like it was "soundly rejected" or anything. I'd certainly support it; Wikipedia's got enough search clout, and you do have to deal with the consequences of that.

Honestly, I'm not sure what google gets out of indexing so quickly; wouldn't it be more in their interest to display more stable results for certain non-trending topics?
Emperor
QUOTE(Alison @ Tue 30th March 2010, 12:29am) *

Earlier today, I went looking on WP for an article related to computing; Active State Power Management or ASPM. It's a pretty common term if you're into computer hardware, and I was surprised to see that Wikipedia didn't have an article on ASPM. Either way, it's something I know a bit about, so I hacked an article together just for fun.


Great, so you took time out of your day to help keep Wikipedia in the #1 position. It's power to hurt people will increase thanks to your labor.

Had you started the article on Encyc, you'd be helping a project where you have real power and influence, and aren't just simply running around slapping bandaids on a broken system.
bambi
QUOTE(BelovedFox @ Wed 31st March 2010, 2:53pm) *

Honestly, I'm not sure what google gets out of indexing so quickly; wouldn't it be more in their interest to display more stable results for certain non-trending topics?

Quite right.

The origin of this is a stupid meme pushed by pundits that found its way into Larry Page's head and got stuck there:
QUOTE
"I have always thought we needed to index the web every second to allow real time search," Le Meur quotes Page as saying. "At first, my team laughed and did not believe me. With Twitter, now they know they have to do it. Not everybody needs sub-second indexing but people are getting pretty excited about realtime."

Page's statement comes less than two weeks after Google execs told reporters that the company is looking at ways of integrating microblogging capabilities, such as those popularized by Twitter, into its search product.

What Page doesn't realize is that if you turn Google into another knee-jerk Twitter by jacking up the importance of recency in the rankings, you also lose a lot of the stability that made Google useful. This sort of thing happens when a search engine like Google has a $180 billion market capitalization, but gets 95 percent of its gross revenue from ads.
Eva Destruction
QUOTE(bambi @ Wed 31st March 2010, 4:49pm) *

QUOTE(BelovedFox @ Wed 31st March 2010, 2:53pm) *

Honestly, I'm not sure what google gets out of indexing so quickly; wouldn't it be more in their interest to display more stable results for certain non-trending topics?

Quite right.

The origin of this is a stupid meme pushed by pundits that found its way into Larry Page's head and got stuck there:
QUOTE
"I have always thought we needed to index the web every second to allow real time search," Le Meur quotes Page as saying. "At first, my team laughed and did not believe me. With Twitter, now they know they have to do it. Not everybody needs sub-second indexing but people are getting pretty excited about realtime."

Page's statement comes less than two weeks after Google execs told reporters that the company is looking at ways of integrating microblogging capabilities, such as those popularized by Twitter, into its search product.

What Page doesn't realize is that if you turn Google into another knee-jerk Twitter by jacking up the importance of recency in the rankings, you also lose a lot of the stability that made Google useful. This sort of thing happens when a search engine like Google has a $180 billion market capitalization, but gets 95 percent of its gross revenue from ads.

If I were Larry Page, I'd have Google instant-index results from news sites, but have 24 hour lag for everything else. But, I'm not. (That approach obviously has its own problem, in that corrections or retractions won't be shown in the Google search until the next day, but I think that's less of an issue.)
Milton Roe
QUOTE(bambi @ Wed 31st March 2010, 8:49am) *

What Page doesn't realize is that if you turn Google into another knee-jerk Twitter by jacking up the importance of recency in the rankings, you also lose a lot of the stability that made Google useful. This sort of thing happens when a search engine like Google has a $180 billion market capitalization, but gets 95 percent of its gross revenue from ads.

Yep. The only "editing" the raw-web has for error, is time-stability of publically-editable (or at least commentable) content. Plus that content that people can read-over, later, after their short term memories clear, and fix. Lose that component, and you get your ads pasted onto "Billy is teh gaaaay." And then they aren't worth so much. Which you said, but I thought I'd just say it again differently.

So, yeah, for Google, this is a stupid, stupid, stupid idea. Their origiinal page-link following search structure explicitly made use of the time=thinking strategy, since it takes TIME and many decisions to make a link-route stronger. Nobody links to childish or stupid comments. That's how Google came to be. And here they are, forgetting the most important lesson they ever taught the world.

Must be brain-aging. unhappy.gif
EricBarbour
QUOTE(Milton Roe @ Wed 31st March 2010, 1:36pm) *
QUOTE(bambi @ Wed 31st March 2010, 8:49am) *
What Page doesn't realize is that if you turn Google into another knee-jerk Twitter by jacking up the importance of recency in the rankings, you also lose a lot of the stability that made Google useful. This sort of thing happens when a search engine like Google has a $180 billion market capitalization, but gets 95 percent of its gross revenue from ads.
So, yeah, for Google, this is a stupid, stupid, stupid idea. Their origiinal page-link following search structure explicitly made use of the time=thinking strategy, since it takes TIME and many decisions to make a link-route stronger. Nobody links to childish or stupid comments. That's how Google came to be. And here they are, forgetting the most important lesson they ever taught the world.

Must be brain-aging. unhappy.gif

Hah. Must be the obscene rivers of ad money.

That's why Google is becoming a blight. It's not even their friendliness with spooks, or the "don't be evil" crap, or the stupid media reports of how they spoil their employees, or their near-successful campaign to monopolize internet search.

It's the fact that they're deliberately, openly degrading the quality of their primary product. The preference given to WP articles is just one example.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.