Help - Search - Members - Calendar
Full Version: Schizm
> Wikimedia Discussion > General Discussion
the_undertow
Hey guys, it's been awhile. I was thinking about the BLP problem at WP and noticing the general response for gross additions to BLPs is usually answered by 'sorry, didn't notice that edit.'

Is it plausible (I know it's outrageous) to move all bios to something like wikibio.org where these articles can be closely monitored? Does this kill the spirit of WP? Is this impossible and potentially messy? BLP seems like a problem that cannot be fixed, but perhaps can be at least contained by having a site, as we have quotes, wikidictionary, etc.

Maybe I'm just high. (I'm not high)

You can tell me it's the worst idea ever in the history of WR. I may cry, but I thought I would throw it out there.
gomi
QUOTE(the_undertow @ Fri 20th February 2009, 12:03am) *
Is this impossible and potentially messy?

Yes, it's impossible. The begins with biographical articles, but ends nowhere near there. There are horrible biographical statements riddled all over Wikipedia articles that are not, per se, biographical.

The problem of Wikipedia is its central tenet of letting a bunch of biased, idiotic and/or grudge-bearing doofuses create an attractive nuisance masquerading as a reference work. Putting band-aids on it won't work.

As I've said elsewhere, one scholar plus twenty teenage idiots is indistinguishable from twenty-one teenage idiots. Hence Wikipedia.
Somey
QUOTE(gomi @ Fri 20th February 2009, 2:14am) *
Yes, it's impossible. The begins with biographical articles, but ends nowhere near there. There are horrible biographical statements riddled all over Wikipedia articles that are not, per se, biographical.

True... Still, let's not be too quick to dismiss - in theory, if we assume that the forked site would have significantly lower Google rankings than Wikipedia itself, then the motivation to vandalize would be reduced proportionately. And it would (presumably) be easier to apply tighter controls over editing rights and version-approval on a separate site, given that the stakes would be lowered and (hopefully) fewer revenge-seeking yabbos would be getting involved.

Ultimately, though, the Google-juice is what it's really all about. And like Gomi implies above, if you remove the BLP's from WP, people will just go to other WP articles to get their defamin' and slanderin' done - that's the nature of the beast. What's more, it's only what, 10 percent, maybe 20 percent of BLP articles that get targeted for various forms of defamatory vandalism and POV campaigning? That's still tens of thousands of articles, but you'd sort of be penalizing the other 80-90 percent by removing them from WP, for nothing more than their subjects not being controversial or disliked enough.
Doc glasgow
The problem has always been one of maintainability.

The Wikipedia toolset produces a range of articles, some are of good quality, some improving, some hopeless, and a small amount positively harmful to the subject. Sure, even the good ones may be wrong - but wikipedia, in fairness, never claimed otherwise (or not much).

The problem is that the community has never accepted that the same "eventualist" toolset does not work for lower-notability biographies. The motivation of those gaming the system is higher. The potential damage greater. The "bad apple" rate is higher, and unlike on Pokemon stubs, ethically unacceptable.

However, wikipedia will not change because 1) too few people accept that damage to third-parties should feature in the equation. 2) there's a silly "free speech" notion still hanging about, whereby people think wikipedia should be as free in its choice of subject as the press, without any of the accountability of the press. 3) Maintainability has never been accepted as a factor in retention discussions. That something can be fixed into a fair neutral bio, does not mean it will be, or will be maintained as such. Systemically Wikipedia cannot fix the proportion of bad bios. But when you say that, people ask for an example and then proceed to fix that one, forgetting that those fixes simply don't upscale.

A "bad example" right now is this afd which will result in a keep - ignoring the fact that Wikipedia has thousands of articles on lowgrade neo-nazis, mainly sourced from ADL style websites, and "maintained" by liberal activists. It will keep these because the wikiscientists do their sums with sources and work out that xsources*WP:NOTABILITY * WP:WTF = article. They never ask the maintenance question.

Wikipedia is not capable of maintaining 300,000 lowgrade BLPs without an unacceptable libel rate, until it stops trying nothing will fix.

Flagged, semi-protection etc, may reduce the fail rate marginally. Removing the 50% of BLPs that are at the lower-notability, underwatched, poorly sourced, and higher impact (because they are the only net source on the subject) is the only solution.

But Wikipediaots are more concerned with vandalism on Palin and Obama, people they can't actually hurt, because it embarrasses them more than Joe Soap who is being libelled by his editing ex-employee. (Why does Wales get upset about a dead Kennedy - who will not be hurt - and not about the average victim? The only concern is PR.)
Jon Awbrey
What Doc sez is true but sleep.gif

Here is the skinny —

Because bathroom wall space is the main commode-ity that Jimbo is specklating in these days.

Ja Ja boing.gif
the_undertow
Clever Jon smile.gif . So what I'm gathering is that the schism doesn't work in practice because the same admins will still be responsible for the content. My hope is that having a separate site would allow these admins to be more vigorous when scrubbing these bios. However, it's been pointed out that moving bios isn't the problem - it's much larger than that because the mentalities will transfer to the new domain as well.
Milton Roe
QUOTE(Jon Awbrey @ Fri 20th February 2009, 10:04pm) *

What Doc sez is true but sleep.gif

Here is the skinny —

Because bathroom wall space is the main commode-ity that Jimbo is specklating in these days.

Ja Ja boing.gif

Is that specklating or spacklating? I've seen Jimbo grout a few times about commodiousness, too, and it wasn't a very smooth job wink.gif
the_undertow
QUOTE(Somey @ Fri 20th February 2009, 12:44am) *


Ultimately, though, the Google-juice is what it's really all about.


I was always under the impression that WP articles rank high in a search query because while there may be one article, it has so many different revisions that it exists in plurality. Is this false? If so, why does WP produce a result for a majority of queries?
Daniel Brandt
QUOTE(the_undertow @ Sat 21st February 2009, 1:34am) *

I was always under the impression that WP articles rank high in a search query because while there may be one article, it has so many different revisions that it exists in plurality. Is this false? If so, why does WP produce a result for a majority of queries?

Not true. The current version is the only version that gets crawled by the search engines. Older versions are in the /w/ directory instead of the /wiki/ directory, and the /w/ directory is disallowed in robots.txt.

Here are some reasons why Wikipedia articles rank so well:

1. Google is under constant pressure to push ads, and push pages that carry ads. About 97 percent of Google's total revenue is from ads. Once their revenue starts diving, their market capitalization will also dive. That's because their stock is overvalued. The extent to which "sponsored links" infiltrate the so-called generic search results, is also the extent to which the quality of Google's results are perceived as in decline. To counter this threat, Google needs Wikipedia. There is very likely some deliberate, artificial "juice" applied to Wikipedia, which is quietly determined by executives whispering to engineers in the hallways at Google. It's Top Secret, Non-Disclosure, etc. — no one will ever be able to prove it. Even a disgruntled ex-employee would get sued to hell and back if he tried to go public with the inside backroom scoop on Google's ranking manipulations.

2. Wikipedia also needs Google. The internal cross linking that is so prevalent in every Wikipedia article is designed to juice up the PageRank of other articles. On an average site, this amount of internal cross-linking would be considered spammy, and would get discounted by Google to a large extent. But Wikipedia is noncommercial, and Google is happy to let Wikipedia have the juice.

3. Biographies of living persons are a particular problem, because the keywords in the anchor text for links juices up the target page for the link. Similarly, the keywords in the title are very important. Wikipedia uses the name of the person in anchor text, as well as in the title. For semi-notable people who aren't named "John Smith" (in other words, the name is not overly-common and doesn't have a lot of competition for ranking), the Wikipedia bio invariably shoots to the top of Google in a search for that name. To compound the problem, a person who is searching for information on someone always starts by entering that someone's name in the search box. You get 100 percent exposure for that person unless they decide to change their name to "John Smith." Many employers routinely "google" everyone who applies for a job.

4. Many sites scrape content from Wikipedia because they need content on their pages in order to carry AdSense ads. Wikipedia itself isn't commercial, but it drives commercial sites. These external links also add to the Google juice of the Wikipedia articles, because the scrapers usually don't change the links. Scraping content is an easy way to generate hundreds or thousands of pages for your ad farms, using hundreds or thousands of domains. There are even programs that generate fake blogs by scraping phrases and stringing them together. It's nonsensical and even hilarious when you try to read it, but it works because the Google crawlers aren't smart enough to realize that it's fake. Here's an example of this.

I could go on and on, but now I'm getting depressed...
UseOnceAndDestroy
QUOTE(the_undertow @ Sat 21st February 2009, 7:34am) *
I was always under the impression that WP articles rank high in a search query because while there may be one article, it has so many different revisions that it exists in plurality. Is this false?

Very. If Google was doing anything with past revisions, its normal behaviour would be to hide them as duplicate results.
QUOTE

If so, why does WP produce a result for a majority of queries?

Its fixed. Wikipedia pages get to the top of Google results within hours of creation, with low quality and no inbound links. See this example, which you can repeat pretty much any time with a sample of newly-created pages.


Peter Damian
QUOTE(Daniel Brandt @ Sat 21st February 2009, 8:44am) *

It's nonsensical and even hilarious when you try to read it, but it works because the Google crawlers aren't smart enough to realize that it's fake. Here's an example of this.


Very funny. I liked the '"flawed and raging poke about tool". Jimbo's penis?
the_undertow
this isn't limited to google. other search engines will produce the same result. why?

QUOTE(Daniel Brandt @ Sat 21st February 2009, 12:44am) *

QUOTE(the_undertow @ Sat 21st February 2009, 1:34am) *

I was always under the impression that WP articles rank high in a search query because while there may be one article, it has so many different revisions that it exists in plurality. Is this false? If so, why does WP produce a result for a majority of queries?

Not true. The current version is the only version that gets crawled by the search engines. Older versions are in the /w/ directory instead of the /wiki/ directory, and the /w/ directory is disallowed in robots.txt.

Here are some reasons why Wikipedia articles rank so well:

1. Google is under constant pressure to push ads, and push pages that carry ads. About 97 percent of Google's total revenue is from ads. Once their revenue starts diving, their market capitalization will also dive. That's because their stock is overvalued. The extent to which "sponsored links" infiltrate the so-called generic search results, is also the extent to which the quality of Google's results are perceived as in decline. To counter this threat, Google needs Wikipedia. There is very likely some deliberate, artificial "juice" applied to Wikipedia, which is quietly determined by executives whispering to engineers in the hallways at Google. It's Top Secret, Non-Disclosure, etc. — no one will ever be able to prove it. Even a disgruntled ex-employee would get sued to hell and back if he tried to go public with the inside backroom scoop on Google's ranking manipulations.

2. Wikipedia also needs Google. The internal cross linking that is so prevalent in every Wikipedia article is designed to juice up the PageRank of other articles. On an average site, this amount of internal cross-linking would be considered spammy, and would get discounted by Google to a large extent. But Wikipedia is noncommercial, and Google is happy to let Wikipedia have the juice.

3. Biographies of living persons are a particular problem, because the keywords in the anchor text for links juices up the target page for the link. Similarly, the keywords in the title are very important. Wikipedia uses the name of the person in anchor text, as well as in the title. For semi-notable people who aren't named "John Smith" (in other words, the name is not overly-common and doesn't have a lot of competition for ranking), the Wikipedia bio invariably shoots to the top of Google in a search for that name. To compound the problem, a person who is searching for information on someone always starts by entering that someone's name in the search box. You get 100 percent exposure for that person unless they decide to change their name to "John Smith." Many employers routinely "google" everyone who applies for a job.

4. Many sites scrape content from Wikipedia because they need content on their pages in order to carry AdSense ads. Wikipedia itself isn't commercial, but it drives commercial sites. These external links also add to the Google juice of the Wikipedia articles, because the scrapers usually don't change the links. Scraping content is an easy way to generate hundreds or thousands of pages for your ad farms, using hundreds or thousands of domains. There are even programs that generate fake blogs by scraping phrases and stringing them together. It's nonsensical and even hilarious when you try to read it, but it works because the Google crawlers aren't smart enough to realize that it's fake. Here's an example of this.

I could go on and on, but now I'm getting depressed...


i did appreciate the detailed analysis. i applaud you for that. you've been quite a prick to me in the past. i hope this trend is now broken.
LessHorrid vanU
In short, the problem needs to be fixed on Wikipedia - cos nothing else has anything like the visibility. As an aside, wasn't this what Conservapedia was supposed to remedy - until it was realised that their agenda was to decide which bias' were permitted rather than allow none.
Lar
QUOTE(the_undertow @ Sat 21st February 2009, 2:21am) *

Clever Jon smile.gif . So what I'm gathering is that the schism doesn't work in practice because the same admins will still be responsible for the content. My hope is that having a separate site would allow these admins to be more vigorous when scrubbing these bios. However, it's been pointed out that moving bios isn't the problem - it's much larger than that because the mentalities will transfer to the new domain as well.

Veropedia was another take on this (coming at it from a different angle... or maybe it was a poor man's sighted revisions) ... maybe some lessons from there???
Random832
QUOTE(Daniel Brandt @ Sat 21st February 2009, 8:44am) *

QUOTE(the_undertow @ Sat 21st February 2009, 1:34am) *

I was always under the impression that WP articles rank high in a search query because while there may be one article, it has so many different revisions that it exists in plurality. Is this false? If so, why does WP produce a result for a majority of queries?

Not true. The current version is the only version that gets crawled by the search engines. Older versions are in the /w/ directory instead of the /wiki/ directory, and the /w/ directory is disallowed in robots.txt.

Here are some reasons why Wikipedia articles rank so well:

1. Google is under constant pressure to push ads, and push pages that carry ads. About 97 percent of Google's total revenue is from ads. Once their revenue starts diving, their market capitalization will also dive. That's because their stock is overvalued. The extent to which "sponsored links" infiltrate the so-called generic search results, is also the extent to which the quality of Google's results are perceived as in decline. To counter this threat, Google needs Wikipedia. There is very likely some deliberate, artificial "juice" applied to Wikipedia, which is quietly determined by executives whispering to engineers in the hallways at Google. It's Top Secret, Non-Disclosure, etc. — no one will ever be able to prove it. Even a disgruntled ex-employee would get sued to hell and back if he tried to go public with the inside backroom scoop on Google's ranking manipulations.

2. Wikipedia also needs Google. The internal cross linking that is so prevalent in every Wikipedia article is designed to juice up the PageRank of other articles. On an average site, this amount of internal cross-linking would be considered spammy, and would get discounted by Google to a large extent. But Wikipedia is noncommercial, and Google is happy to let Wikipedia have the juice.


The way page rank is publicly explained, this would tend to _average_ (rather than strictly increase) the pagerank of WP pages - increasing the pagerank of lesser-known wikipedia articles at the expense of that of the more popular ones. I also think "designed" is also too strong a term - this assumes that the internal linking doesn't have any benefit for actual reader navigation of the site. Certainly other wiki sites that don't enjoy a high page rank (such as the original c2 wiki - which has high rank for terms that they coined, like "code smell" or "antipattern", but less for other relevant terms like "refactoring" - or tvtropes - which, again, has a decent rank in the numerous terms coined by them and not substantially used elsewhere, but doesn't get much at all for its equally large number of articles about fictional works) have a substantially similar degree of internal cross linking.

The way google claims page rank works, a site would not get an overall benefit from lots of internal linking - their pages that get fewer external links would get a higher page rank, at the expense of the pages that do get the external links, which is where their "juice" comes from in the first place. Since this is what they claim, then regardless of whether it's true or not there would be no motive for someone to do so - no basis for someone to "design" a site to have such characteristics.

If wikipedia didn't have any "juice" coming in in the first place, it wouldn't have any to spread around between its pages. The real reason is the numerous people who sincerely link to wikipedia articles to give an explanation on some topic in web forums, blogs, etc.
dtobias
QUOTE(Daniel Brandt @ Sat 21st February 2009, 3:44am) *

It's nonsensical and even hilarious when you try to read it, but it works because the Google crawlers aren't smart enough to realize that it's fake. Here's an example of this.


I kind of liked "nitpick scene designer Daniel Brandt", as well as "Wales view Brandt as a bigwig municipal digit."
Jon Awbrey
QUOTE(the_undertow @ Sat 21st February 2009, 2:21am) *

Clever Jon smile.gif . So what I'm gathering is that the schism doesn't work in practice because the same admins will still be responsible for the content. My hope is that having a separate site would allow these admins to be more vigorous when scrubbing these bios. However, it's been pointed out that moving bios isn't the problem — it's much larger than that because the mentalities will transfer to the new domain as well.


No, I think you're confusing Speckle-ating, which is what Speckle-ators do in this or that Commode-ity Markup, and Spackle-ating, which requires a Scraping first and a Wit-Washing after.

Ja Ja boing.gif
the_undertow
I have scrutinized why WP comes up first or close in searches. I have read over and considered all speculation, but I am still skeptical.

There has to be a less-technical, definitive answer. How did Wikipedia manage to show up whenever a search is done? This really is what I'm concerned about. I still hold to the idea that articles with multiple revisions show as such because search engines see it as 'one article.'

By the way, I appreciate all the conversation that has been dedicated to this ideal. I have created articles with numerous revisions, which goes against my own theory.

In short, I still don't get why WP ranks so high on any search engine. The site is truly as arbitrary as the next, right?

I would love to pretend WR and WP didn't exist to me. I know that will be taken the wrong way. It's nearly impossible to forget my time spent at both sites when I do search for 'fee simple' and hey, up comes WP.

I guess I could unplug, relax, have a beer, but the reality is that eventually I'm going to use my Internet to search and WP IS GOING TO COME UP. I still don't get why.
Jon Awbrey
QUOTE(the_undertow @ Sun 22nd February 2009, 6:41am) *

I have scrutinized why WP comes up first or close in searches. I have read over and considered all speculation, but I am still skeptical.


I started a thread somewhere on the Brin & Page Rank Algorithm. What makes it so easy to game is the fact that it favours Connectivity over actual Content.

I have elsewhere given examples of Wikipedia articles that have now been reduced to redirects or stubs, but they still come up high in the Non-Notebooked Google searches, because there are so many residual links from other Wikipedia articles, talk pages, user pages, project pages, scraper sites, etc.

Jon Image
Bottled_Spider
QUOTE(the_undertow @ Sun 22nd February 2009, 11:41am) *
I would love to pretend WR and WP didn't exist to me. I know that will be taken the wrong way. It's nearly impossible to forget my time spent at both sites when I do search for 'fee simple' and hey, up comes WP.

I guess I could unplug, relax, have a beer, but the reality is that eventually I'm going to use my Internet to search and WP IS GOING TO COME UP.

Start up Google and input :-
fee simple -wikipedia

QUOTE
I still don't get why.

It's all a question of hand-eye coordination.
Chris Croy
QUOTE(Random832 @ Sat 21st February 2009, 5:07pm) *

If wikipedia didn't have any "juice" coming in in the first place, it wouldn't have any to spread around between its pages. The real reason is the numerous people who sincerely link to wikipedia articles to give an explanation on some topic in web forums, blogs, etc.

This. Looking for elaborate explanations when there's a simple one is not a worthwhile way to spend ones time. One of my favorite instances of Wiki linking is when someone's official website does it because Wikipedia explains it better. The only example I can think off-hand is Butch Walker's website. His biography section is just a link to wikipedia with the text "This is responsible for killing The Bio in general." beneath it.

---

A schism wouldn't accomplish anything. What difference does it make if I write "Kato's an ignorant slut" in an article called Kato or an article called Kato's Kalendar Korp?
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.