Help - Search - Members - Calendar
Full Version: Rootology's NOCACHE proposal
> Wikimedia Discussion > General Discussion
Kato
Following the Obama screw up, Wikipedia Reviewer Rootlogy has drafted a new proposal for WP:

http://en.wikipedia.org/wiki/Wikipedia:Sea...efault_proposal

QUOTE
Wikipedia currently as of mid-February 2009 allows all search engines to "cache" its results. That is, if a search engine like Google happens to crawl a page, any inappropriate or "bad" content, including WP:BLP violations, may be propagated out onto the Internet for an indeterminate amount of time. However, we have the ability to set Wikipedia to be "NOCACHE" in our robots.txt file. The major benefit of this is that search engines would only report the "current" state of an article (or any page) at any given time.
At least once, a slightly prominent BLP article was vandalized with racial epithets, that the world's search engines then cached.[1] A vandal replaced the entire BLP article with three epithets.[2] However, the damage was done, and according to Wikipedia on search engines, we were now referring to the BLP subject as "NIGGA".[3] The edit was reversed less than two minutes later, but the damage was done.[4]
That was one of the single most-watched BLP articles we've ever had--what chance do the hundreds of thousands of lesser-known BLP articles have? The idea behind this proposal would be to protect not just BLPs, but the integrity of our articles themselves from being cached with bad information, even temporarily.
jch
I'd like to suggest that all of wikipedia be NOINDEX instead. Think it'd get any supports?
Viridae
QUOTE(jch @ Wed 18th February 2009, 9:08pm) *

I'd like to suggest that all of wikipedia be NOINDEX instead. Think it'd get any supports?


Nope, including none from me.
dtobias
Some version of flagged revisions, to prevent vandalized or other unchecked versions from getting cached or indexed, would be a sensible idea.
Random832
QUOTE(dtobias @ Wed 18th February 2009, 1:19pm) *

Some version of flagged revisions, to prevent vandalized or other unchecked versions from getting cached or indexed, would be a sensible idea.

The only way to do that would be to not SERVE those versions to search engines - nocache doesn't keep the text off the search results page, it just prevents you from clicking through to see the entire version the google bot saw... and no-one ever does that anyway. This is an empty proposal that will solve nothing.
JoseClutch
QUOTE(dtobias @ Wed 18th February 2009, 8:19am) *

Some version of flagged revisions, to prevent vandalized or other unchecked versions from getting cached or indexed, would be a sensible idea.

+1
Would read again.
dtobias
How about a NOCASH proposal, to take away most of everybody's money who put it in any kind of investment... wait, I think that already went into effect last year.
Jon Awbrey
QUOTE(Kato @ Wed 18th February 2009, 4:40am) *

Following the Obama screw up, Wikipedia Reviewer Rootlogy has drafted a new proposal for WP:

en.wikipedia.org/wiki/Wikipedia:Search Engine NOCACHE by default proposal


Wikipedia has been NOCACHE since 02 Sep 2006.

Φat lot of good that did 'em!

Ja Ja boing.gif
jch

How about adding "/" to their robots.txt? That'd do the same job, and make plenty of people with BLPs happy.
Bottled_Spider
How about making two versions of every article? One, very short and to-the-point, ie "Barack Obama is a guy who won some sort of competition and became Prime Minister of America, or something" that is allowed to be crawled by search-engines, and the second article search-engine proof that can contain any old shite? I think we're on to a winner, here.
jch
QUOTE(Bottled_Spider @ Wed 18th February 2009, 5:24pm) *

How about making two versions of every article? One, very short and to-the-point, ie "Barack Obama is a guy who won some sort of competition and became Prime Minister of America, or something" that is allowed to be crawled by search-engines, and the second article search-engine proof that can contain any old shite? I think we're on to a winner, here.

If you're going that far...

The first version points at Brittanica Online,

The second shows WP?
dogbiscuit
QUOTE(Bottled_Spider @ Wed 18th February 2009, 5:24pm) *

How about making two versions of every article? One, very short and to-the-point, ie "Barack Obama is a guy who won some sort of competition and became Prime Minister of America, or something" that is allowed to be crawled by search-engines, and the second article search-engine proof that can contain any old shite? I think we're on to a winner, here.

The ultimate solution is not flagged revisions, but flagged editions where there is a recognition that a product that is as widely published as Wikipedia is (not?) has mechanisms for selecting what text goes in which edition.

The Short and To The Point is supposed to be the lead. It would be perfectly reasonable to make Wikipedia only publish the lead in web searches and only present that, with the next level available at a click (with some cookie stuff to control editions).

If Wikipedia was serious about producing a useful reference work, you'd think they'd be leaping at including technology to allow text to be flagged as appropriate for different editions - editorial discretion together with not being censored. Unfortunately the pea branes over there are so fixated on OMGCENSORSHIPlUtZ!!! that they cannot conceive of why editorial control might be a good thing.

It could lead to whole new came playing roles, where new editors review text and assign sections into categories, ADULT, SEXUAL CONTENT, SCANDAL, BORING SCIENCE, TREKKIES ONLY and so on, and with a set of categories set for the reader, the browser could readily deliver articles. Think of the fun to be had making sure that they read well regardless of filtering.
GlassBeadGame
QUOTE(Bottled_Spider @ Wed 18th February 2009, 12:24pm) *

How about making two versions of every article? One, very short and to-the-point, ie "Barack Obama is a guy who won some sort of competition and became Prime Minister of America, or something" that is allowed to be crawled by search-engines, and the second article search-engine proof that can contain any old shite? I think we're on to a winner, here.


Then maybe someone could collect all those "to the point articles" into some kind of useful reference work, something on the order of an encyclopedia.
Bottled_Spider
QUOTE(GlassBeadGame @ Wed 18th February 2009, 5:54pm) *
Then maybe someone could collect all those "to the point articles" into some kind of useful reference work, something on the order of an encyclopedia.

Yeah! I'd call it "The No-Shit Encyclopaedia - Everything You Want To Know Minus The Boring Stuff!". Stranger things have happened. Stir in some of the advice given by jch and dogbiscuit, above, and it would be a publishing sensation.
Milton Roe
QUOTE(GlassBeadGame @ Wed 18th February 2009, 10:54am) *

QUOTE(Bottled_Spider @ Wed 18th February 2009, 12:24pm) *

How about making two versions of every article? One, very short and to-the-point, ie "Barack Obama is a guy who won some sort of competition and became Prime Minister of America, or something" that is allowed to be crawled by search-engines, and the second article search-engine proof that can contain any old shite? I think we're on to a winner, here.


Then maybe someone could collect all those "to the point articles" into some kind of useful reference work, something on the order of an encyclopedia.

You can think of variations:

LEAD-pedia: Consists of nothing but the LEAD sections of all the stable versions of WP articles. No illustrations.

LEAD-pedia Condensed: Consists of the top-referenced "N thousand" above articles (pick your cutoff for "top" to get the size you like), as counted by numbers of page-references per day. With assessment period chosen to be before the announcement, to prevent post-announcement gaming.

You know, with no illustrations you could probably get every LEAD and stub of the entire WP English Wikipedia onto one 4 GB flashdrive. Then you could look a lot of stuff up to probably the level you're interested in on the go, without net or cell connect at all.
One
QUOTE(Viridae @ Wed 18th February 2009, 10:59am) *

QUOTE(jch @ Wed 18th February 2009, 9:08pm) *

I'd like to suggest that all of wikipedia be NOINDEX instead. Think it'd get any supports?


Nope, including none from me.

Why not? Many of the site's problems are exacerbated by search engines. Was a much more communal place before everything was hit #1.
Sarcasticidealist
I don't see the "indexing the lead only" solution (or any indexing solution short of just not indexing the article) as really much of a solution at all. I suspect that most BLP damage is inflicted by something like the following process:

1. Somebody wants to know about Person X.
2. Person Googles "Person X" (along with a few other identifiable words for people with common names - "Steve Smith Fredericton", for example, or "Steve Smith complete prat").
3. Person sees that Person X has a Wikipedia article; clicks on it.

Indexing only the lead would do nothing to disrupt the above process.
dtobias
QUOTE(One @ Wed 18th February 2009, 3:44pm) *

Why not? Many of the site's problems are exacerbated by search engines. Was a much more communal place before everything was hit #1.


When it was a small, obscure, geeky project then it was in a sense more fun for geeks like me, yes... but that doesn't mean that it's a good idea to try to turn the clock back and stop the outside world from finding and using it.
Milton Roe
QUOTE(One @ Wed 18th February 2009, 1:44pm) *

QUOTE(Viridae @ Wed 18th February 2009, 10:59am) *

QUOTE(jch @ Wed 18th February 2009, 9:08pm) *

I'd like to suggest that all of wikipedia be NOINDEX instead. Think it'd get any supports?


Nope, including none from me.

Why not? Many of the site's problems are exacerbated by search engines. Was a much more communal place before everything was hit #1.

And a variation: NOINDEX only for BLPs. Except for the dead-tree famous ones.
One
QUOTE(Milton Roe @ Wed 18th February 2009, 9:16pm) *

QUOTE(One @ Wed 18th February 2009, 1:44pm) *

QUOTE(Viridae @ Wed 18th February 2009, 10:59am) *

QUOTE(jch @ Wed 18th February 2009, 9:08pm) *

I'd like to suggest that all of wikipedia be NOINDEX instead. Think it'd get any supports?


Nope, including none from me.

Why not? Many of the site's problems are exacerbated by search engines. Was a much more communal place before everything was hit #1.

And a variation: NOINDEX only for BLPs. Except for the dead-tree famous ones.

There we go.
Kato
QUOTE(Sarcasticidealist @ Wed 18th February 2009, 8:52pm) *

1. Somebody wants to know about Person X.
2. Person Googles "Person X" (along with a few other identifiable words for people with common names - "Steve Smith Fredericton", for example, or "Steve Smith complete prat").

Hey, have you been tracking my browsing history?
wink.gif

Milton Roe
QUOTE(One @ Wed 18th February 2009, 2:23pm) *

QUOTE(Milton Roe @ Wed 18th February 2009, 9:16pm) *

QUOTE(One @ Wed 18th February 2009, 1:44pm) *

QUOTE(Viridae @ Wed 18th February 2009, 10:59am) *

QUOTE(jch @ Wed 18th February 2009, 9:08pm) *

I'd like to suggest that all of wikipedia be NOINDEX instead. Think it'd get any supports?


Nope, including none from me.

Why not? Many of the site's problems are exacerbated by search engines. Was a much more communal place before everything was hit #1.

And a variation: NOINDEX only for BLPs. Except for the dead-tree famous ones.

There we go.

I am amazed that I (or somebody) didn't think of this as soon as we learned about NOINDEX. Hell, even Brandt didn't propose it.

I guess when people want the whole pie, they'll never settle for a slice. Myself, I prefer to slice away a slice at a time till it's gone.
JoseClutch
QUOTE(Milton Roe @ Wed 18th February 2009, 10:57pm) *

QUOTE(One @ Wed 18th February 2009, 2:23pm) *

QUOTE(Milton Roe @ Wed 18th February 2009, 9:16pm) *

QUOTE(One @ Wed 18th February 2009, 1:44pm) *

QUOTE(Viridae @ Wed 18th February 2009, 10:59am) *

QUOTE(jch @ Wed 18th February 2009, 9:08pm) *

I'd like to suggest that all of wikipedia be NOINDEX instead. Think it'd get any supports?


Nope, including none from me.

Why not? Many of the site's problems are exacerbated by search engines. Was a much more communal place before everything was hit #1.

And a variation: NOINDEX only for BLPs. Except for the dead-tree famous ones.

There we go.

I am amazed that I (or somebody) didn't think of this as soon as we learned about NOINDEX. Hell, even Brandt didn't propose it.

I guess when people want the whole pie, they'll never settle for a slice. Myself, I prefer to slice away a slice at a time till it's gone.

Naw, that has been proposed several times. And pretty rightly rejected; if an article should be noindexed, it should not exist.
dogbiscuit
QUOTE(JoseClutch @ Thu 19th February 2009, 2:48pm) *

QUOTE(Milton Roe @ Wed 18th February 2009, 10:57pm) *

I am amazed that I (or somebody) didn't think of this as soon as we learned about NOINDEX. Hell, even Brandt didn't propose it.

I guess when people want the whole pie, they'll never settle for a slice. Myself, I prefer to slice away a slice at a time till it's gone.

Naw, that has been proposed several times. And pretty rightly rejected; if an article should be noindexed, it should not exist.

That is a flawed argument based on the failure to distinguish data gathering and publication. If one subscribes to a model of Wikipedia that gathering any information is fair game, it does not preclude the concept that such information might not be published. It could be, for example, that the information is not properly edited into a fit state, or it is inappropriate to publish because of BLP concerns, yet it could be argued that holding the information privately in some form might be reasonable.

Wikipedia hasn't worked out how to separate the two. When Wikipedia was an experiment and a small project, accidentally leaving work in progress in public view was not a big deal. Now, as it sits at the very top of the search engine heap, that is a big deal. Therefore, if NOINDEX is the only way to stop it being published widely, then it at least makes a reasonable compromise between the data gatherers who would claim that it was a mortal sin to throw information away and those who have a small quality of morals in their make up.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.