> It's slowly dying

EricBarbour

All the indicia in the charts I've prepared show the same thing: since 2007 or later, participation is not merely flatlining, it's dropping.

And here's something even more disturbing, because it comes from the WMF itself.
Note that, for the first time since 2008, in 2011 all the indicia started to go down.
(Ever-increasing web traffic was always the most important part of the Wiki-Magic.
But now even that isn't working.)

gomi

I don't think that this solves the problem that is Wikipedia. Like Lenin, there are acolytes who will carefully tend the corpse of Wikipedia long after its demise, and it will be used to commit offenses far into the future.

GlassBeadGame

Moved off topic post to Thank You Mr. Know It All in the annex.

communicat

North America appears to be flat-lining, i.e. brain dead.

EricBarbour

QUOTE(gomi @ Tue 18th October 2011, 11:52am)

I don't think that this solves the problem that is Wikipedia. Like Lenin, there are acolytes who will carefully tend the corpse of Wikipedia long after its demise, and it will be used to commit offenses far into the future.

Wikipedia will become like the Open Directory Project.
Completely closed, run by a small gang of totally insane nerds, hostile to outsiders. And used by no one.

It's one of the great mysteries of the web: why AOL and (whatever's left of) Netscape keep it online.
It's not very big, not updated very often, and not very comprehensive. Yet there it sits, year after year.

(Here's something to mull over: Wikipedia is obsessed with football, specifically Association football/soccer. It's the single largest subject area on Wikipedia. And if you look at dmoz's sports section, guess what the largest sports subject area is.)

jd turk

QUOTE(EricBarbour @ Thu 20th October 2011, 1:54pm)

Wikipedia will become like the Open Directory Project.
Completely closed, run by a small gang of totally insane nerds, hostile to outsiders. And used by no one.

When you put it that way, we don't seem too far off.

communicat

QUOTE(jd turk @ Thu 20th October 2011, 10:35pm)

QUOTE(EricBarbour @ Thu 20th October 2011, 1:54pm)

Wikipedia will become like the Open Directory Project.
Completely closed, run by a small gang of totally insane nerds, hostile to outsiders. And used by no one.

When you put it that way, we don't seem too far off.

jd turk

QUOTE(communicat @ Thu 20th October 2011, 3:46pm)

QUOTE(jd turk @ Thu 20th October 2011, 10:35pm)

QUOTE(EricBarbour @ Thu 20th October 2011, 1:54pm)

Wikipedia will become like the Open Directory Project.
Completely closed, run by a small gang of totally insane nerds, hostile to outsiders. And used by no one.

When you put it that way, we don't seem too far off.

Well, we're right on with the nerds and the hostility. Wikipedia is still widely used but people are becoming more and more aware that what they're reading very well could be incorrect crap. It'll take a while before it phases out of the public view. With the population as lazy as they are, it'll also probably take a widely-available alternative. Even with the internet, the majority of people don't want to have to do any research themselves.

communicat

QUOTE(gomi @ Tue 18th October 2011, 8:52pm)

I don't think that this solves the problem that is Wikipedia. Like Lenin, there are acolytes who will carefully tend the corpse of Wikipedia long after its demise, and it will be used to commit offenses far into the future.

WP will remain popular so long as there are people who enjoy being spoon-fed with predigested information; and also so long as there's an education system that stifles critical thinking and orignal research by allowing, even encouraging, student essays to plagairise WP material. Never mind that WP material itself is in a sense just one step removed from plagiarism by disallowing its own "editors" from engaging in critical thinking and OR.

Maetu

QUOTE

(Here's something to mull over: Wikipedia is obsessed with football, specifically Association football/soccer. It's the single largest subject area on Wikipedia. And if you look at dmoz's sports section, guess what the largest sports subject area is.)

It's one of the few sports that has total world wide penetration.

The Joy

QUOTE(Maetu @ Thu 27th October 2011, 7:14pm)

QUOTE

(Here's something to mull over: Wikipedia is obsessed with football, specifically Association football/soccer. It's the single largest subject area on Wikipedia. And if you look at dmoz's sports section, guess what the largest sports subject area is.)

It's one of the few sports that has total world wide penetration.

No, that's Wikimedia Commons!

lilburne

But they are adding really cool features all the time.

everyking

There's no significance in that slight drop, which covers only a few months of 2011. You can see previous points in the chart where traffic dropped for comparable periods.

Tarc

QUOTE(everyking @ Fri 28th October 2011, 9:00am)

There's no significance in that slight drop, which covers only a few months of 2011. You can see previous points in the chart where traffic dropped for comparable periods.

Don't tap on the glass, you'll confuse the fish.

Silver seren

QUOTE(everyking @ Fri 28th October 2011, 1:00pm)

There's no significance in that slight drop, which covers only a few months of 2011. You can see previous points in the chart where traffic dropped for comparable periods.

That's exactly what I was thinking. There would have to be a sharper dip than that to prove a downward trend, though it does appear like the lines are leveling out.

chrisoff

QUOTE

WP will remain popular so long as there are people who enjoy being spoon-fed with predigested information; and also so long as there's an education system that stifles critical thinking and orignal research by allowing, even encouraging, student essays to plagairise WP material. Never mind that WP material itself is in a sense just one step removed from plagiarism by disallowing its own "editors" from engaging in critical thinking and OR.

Per Communicat (above)

Unique visitors by region

All that shows is that more people have internet access and click the wikipedia articles high on search engine lists. Don't necessarily read them. Just the lede maybe.

Says nothing about editors dropping off, or rate of article improvement/accuracy. Jimbo wants it to become an academic repository, gutting books of their contents and justifying that with a "citation". Books are better than FA articles.

I read the pedia just to see all the infighting and backbiting, the POV battles, the power trips, and the important things like no one wants to run for admin status.

Bored with editing it now. The FAC warriors are killing it. That is one ugly clique.

EricBarbour

QUOTE(Silver seren @ Fri 28th October 2011, 10:12am)

QUOTE(everyking @ Fri 28th October 2011, 1:00pm)

There's no significance in that slight drop, which covers only a few months of 2011. You can see previous points in the chart where traffic dropped for comparable periods.

That's exactly what I was thinking. There would have to be a sharper dip than that to prove a downward trend, though it does appear like the lines are leveling out.

If you gents knew what I know about editing patterns, you wouldn't be trying to rationalize it away.
Even the administrators are quitting in droves. And no one's talking about it---because the WMF
is deliberately keeping them on the admin rolls. Even the ones who disappeared years ago.

Malleus

QUOTE(EricBarbour @ Fri 28th October 2011, 8:11pm)

QUOTE(Silver seren @ Fri 28th October 2011, 10:12am)

QUOTE(everyking @ Fri 28th October 2011, 1:00pm)

There's no significance in that slight drop, which covers only a few months of 2011. You can see previous points in the chart where traffic dropped for comparable periods.

That's exactly what I was thinking. There would have to be a sharper dip than that to prove a downward trend, though it does appear like the lines are leveling out.

If you gents knew what I know about editing patterns, you wouldn't be trying to rationalize it away.
Even the administrators are quitting in droves. And no one's talking about it---because the WMF
is deliberately keeping them on the admin rolls. Even the ones who disappeared years ago.

So why not share what you know?

radek

QUOTE(chrisoff @ Fri 28th October 2011, 1:03pm)

QUOTE

WP will remain popular so long as there are people who enjoy being spoon-fed with predigested information; and also so long as there's an education system that stifles critical thinking and orignal research by allowing, even encouraging, student essays to plagairise WP material. Never mind that WP material itself is in a sense just one step removed from plagiarism by disallowing its own "editors" from engaging in critical thinking and OR.

Per Communicat (above)

Unique visitors by region

All that shows is that more people have internet access and click the wikipedia articles high on search engine lists. Don't necessarily read them. Just the lede maybe.

Says nothing about editors dropping off, or rate of article improvement/accuracy. Jimbo wants it to become an academic repository, gutting books of their contents and justifying that with a "citation". Books are better than FA articles.

I read the pedia just to see all the infighting and backbiting, the POV battles, the power trips, and the important things like no one wants to run for admin status.

Bored with editing it now. The FAC warriors are killing it. That is one ugly clique.

The image you want for editors is this one:

here it's pretty clear that at least the English Wiki is bleeding editors who are - potentially - doing something worthwhile like an innocent virgin beset by a horde of brain eating zombies (sorry, it is close to Halloween). The trend is the same, but less pronounced for editors with smaller number of edits per month but keep in mind that that statistic includes things like rate of vandalism etc.

The way the graph is scaled, with all individual Wikis slapped in together very much hides the extent of the problem. But basically for the English Wikipedia the number of active editors has gone down by about 20% since April 2008.

German and French wikipedias are more or less constant. Italian one has gone down, but that one has always had a lot of volatility so who knows (if you know anything about the business cycle of the Italian economy vis-a-vis the French and German one, this is actually sort of funny). Russian Wikipedia has been taking off though that's probably because widespread internet access there is relatively recent. Commons is also booming, but knowing what we know about Commons... is that a good thing?

For article creation look at this one

Here Spanish Wikipedia and, again, Russian, are looking good but the English one is decomposing faster than a corpse in left out in the Mexican desert. Again, the way they're presenting the data doesn't make it look that bad, but look at the axis and you see that the decline in new articles per day since March 2008 has been something like ... 45%. That's a FALL of 45%.

And some of the posters above are right in that it's also very possible that the composition of new articles created has changed. Currently something like 60% of all English Wikipedia articles are stubs. Out of those something like 10 to 15% are actually bot-created stubs (maybe even more). I don't have super concrete data on this right now (trying to collect some) but I very much suspect that the share of "stubs-which remain stubs for ever" in "new articles created" has gone up significantly over time.

So the graph above actually UNDERSTATES the decline in number of articles created per day/month, whatever (there is actually a bit of an upswing in the last few months, but again, it's too early to tell if that's real or some kind of "push" to offset the trend, or just plain old statistical noise in the data).

So based on that we know that the rate at which new articles are being created has fallen by AT LEAST 42% or so since March 2008 (and much more significantly so since 2007). You make an educated guess as to the extent to which bot-generated and Dr. Blofeld generated (is that a bot? not sure) stubs have replaced actual content-containing articles in the "new articles" pool and you get something like anywhere from 60% to 80% down.

And then there's the trend, which is strictly downward.

Also to quickly address everking's point. I do think that the original chart that Eric posted - which is really the original chart that Wikimedia posted, with all the caveats that entails - is not very informative. What you'd really want is unique visitors normalized by total web traffic. Ideally, something like "(unique visitors to Wikipedia)/(unique visitors to an average Website)". So if the number of unique visitors to Wikipedia is constant but the numerator has been going up, the fall in that graph would be much more dramatic.

Tarc

QUOTE(radek @ Fri 28th October 2011, 9:22pm)

So based on that we know that the rate at which new articles are being created has fallen by AT LEAST 42% or so since March 2008 (and much more significantly so since 2007). You make an educated guess as to the extent to which bot-generated and Dr. Blofeld generated (is that a bot? not sure) stubs have replaced actual content-containing articles in the "new articles" pool and you get something like anywhere from 60% to 80% down.

And then there's the trend, which is strictly downward.

Is that necessarily a bad thing, though? Take any topic, let's say the Beatles. Editors have gone there over the years, created articles on the band, albums, tours, controversies, places, and influences. A Beatles fan comes along post-2008, and finds most everything he can think of to be covered already. Same with others, from WWII to Lost to Lady Gaga, perhaps there is a saturation point where there just isn't much new content to generate, therefore less editor participation.

Of course there are current events to write about, from the legitimate (Libyan revolution) to the asinine (anything the ARSEholes promote...listeria outbreaks, crazy guys who let exotic animals loose, Rebecca Black clones, etc...) but that has always been the case.

radek

QUOTE(Tarc @ Fri 28th October 2011, 9:27pm)

QUOTE(radek @ Fri 28th October 2011, 9:22pm)

So based on that we know that the rate at which new articles are being created has fallen by AT LEAST 42% or so since March 2008 (and much more significantly so since 2007). You make an educated guess as to the extent to which bot-generated and Dr. Blofeld generated (is that a bot? not sure) stubs have replaced actual content-containing articles in the "new articles" pool and you get something like anywhere from 60% to 80% down.

And then there's the trend, which is strictly downward.

Is that necessarily a bad thing, though? Take any topic, let's say the Beatles. Editors have gone there over the years, created articles on the band, albums, tours, controversies, places, and influences. A Beatles fan comes along post-2008, and finds most everything he can think of to be covered already. Same with others, from WWII to Lost to Lady Gaga, perhaps there is a saturation point where there just isn't much new content to generate, therefore less editor participation.

Of course there are current events to write about, from the legitimate (Libyan revolution) to the asinine (anything the ARSEholes promote...listeria outbreaks, crazy guys who let exotic animals loose, Rebecca Black clones, etc...) but that has always been the case.

Yeah there's that. But thinking about it, couple points:

1. It's a philosophical issue. Sure, you can write everything there is to write about one particular narrowly defined topic like "The Beatles". But that doesn't mean the number of topics has been exhausted. Related to that...
2. It's a distributional issue. Some topics, like "The Beatles" or "Football" probably have been exhausted. But even in the stuff I'm interested in I'm pretty sure there are a lot of topics which are woefully undercover (or when covered, covered really atrociously). Economics topic area is still pathetic. History, outside of British and American history and WWII, is still junk. And like I said that's just my interests. I'm sure others can provide their own examples. And yeah, that does have a lot to do with the Wikipedia demographic.
3. It's a mathematical issue. In other words, what is it converging to? Zero articles per month or some lower but positive level. The first one does sort of implies that it's "dying" though it might be of 'old age' rather than 'disease'.
4. It's a hidden issue. Ok, so maybe there is some ultimate number of articles it makes sense for Wikipedia to have and beyond that it's cruft. But then there should be a switch from article creation to article improvement. As the data on # of active editors indicates though, that ain't happening. And as I alluded to above, a lot of these stubs stay stubs for very long period of time, like "ever". And just anecdotally, from what I observe, there's a buttload of old articles that have sucked for a very very long time. Basically, there's absolutely no incentives - maybe even negative incentives - to improve old articles.

And...

5. Even ignoring all that and supposing that there's some optimal finite number of article to be had, and all of them are just freakin' awesome ... it's just dishonest for Wikipedia to essentially lie about its article growth rate by inflating it via inclusion of one or two sentence content-free stubs. Right now on the main page it says "3,780,433 articles in English". Nah. If they're lucky they got 2,268,259. (and what number of these are long but still crap?). And most of the new ones being created don't count. So just fess it up. Claim victory, do a victory lap and go home. Quit lying.

EricBarbour

QUOTE(Malleus @ Fri 28th October 2011, 6:22pm)

So why not share what you know?

In good time. No cockamamie free license, though.

One of WP's darkest secrets: people are quitting, including admins and others who used to write
lots of really good articles. Because participation has been declining more-or-less steadily since 2008,
the remaining fanboys are writing custom script bots to generate "content". It's not good content, it's
mostly crap stubs and short articles using information scraped from other sites. Many of which are
for-profit and copyrighted. Good examples are all the football-player stubs being created right now,
obviously by automated software.

Most of that info is scraped from various league websites. Olympic athletes are being scraped from
databaseOlympics and sports-reference.com. Boxers are being scraped from BoxRec. Etc, etc.

(I've thought about asking the operators of Sports Reference if they know how many gigabytes of data
have been stolen from their site by Wikipedia nerds. It is possible they have no idea.)

Same for various bots that scrape geographical trivia from other language WPs and other sites and
generate useless stubs about them, in "English", to post on English WP. Blofeld is notorious for doing
that but he's far from being the only one. Jaguar has similar bots, and is doing Chinese villages
this week (he knows nothing about China or the Chinese language).

Plus Merovingian's minor-planet bot, NotWith's obscure-bugs-and-plants bot, and many many more.

They are making crap articles, that no one wants to read and no one will ever expand.
As mentioned here, they know what they are doing and they don't care.

Don't believe me? Spend a few hours looking thru new pages. More than a thousand new articles
per day, most of them crap.

Because nobody really knows how many scripts there are or what they are doing, the WMF can sit
there and claim ignorance, and the few active admins will keep grinding out vandalism repairs and
some bot owners can keep grinding out useless article reformattings and trivial things. It all goes into
the edit statistics, and makes Wikipedia look less sick than it actually is.

Did you know that Wikipedia has 1350 known bots? And there are many that no one knows about.
Because no one is keeping track.

Ever wondered why 5000 new accounts are created every day?
And why a large percentage of them are created by various bots, and tagged "created automatically"?
I figure at least 60% of those accounts make ZERO edits. Ever.

This is how Wikipedia will die. It will be nibbled to death, by the remaining crazies.

EricBarbour

QUOTE(radek @ Fri 28th October 2011, 7:43pm)

Right now on the main page it says "3,780,433 articles in English". Nah. If they're lucky they got 2,268,259. (and what number of these are long but still crap?).

Not even that many. I figure that maybe 300,000 are actually useful (206,000 are more than 15k bytes). The rest are garbage, stubs, disambiguation pages, etc. Many long articles are crap, and IMO most of those "List of"
things are utter useless bullshit.

This is what Wikipedians are like.
Even when you beat them over the head with facts, they deny and deny and deny.

Kelly Martin

QUOTE(Tarc @ Fri 28th October 2011, 9:27pm)

Is that necessarily a bad thing, though? Take any topic, let's say the Beatles. Editors have gone there over the years, created articles on the band, albums, tours, controversies, places, and influences. A Beatles fan comes along post-2008, and finds most everything he can think of to be covered already. Same with others, from WWII to Lost to Lady Gaga, perhaps there is a saturation point where there just isn't much new content to generate, therefore less editor participation.

Except it's not true. Yes, most of the low-hanging fruit has been picked, but there's a huge space of articles that have not yet been written. Wikipedia is terrible at producing decent articles about topics. That's because nearly anybody can pastiche together some semblance of a biographical article about a person, or something similar about a place, or an event, without really understanding the topic in question, but to write a good article about a topicâ€”that is, about a whole idea, not just a single thingâ€”requires a real understanding of the topic in question, and not just the ability to find a couple of references and meatgrinder them into the plagiaristic pablum that passes for authorship on Wikipedia. Such people are fairly few and far between, and virtually all of them have the vain belief that they should be compensated for the use of their talents. There are a handful on Wikipedia yet still, but mostly with expertise only in topics that are of fairly little interest to the general public. Far more have been chased off the site by the community's unrelenting antipathy to experts and dogged insistence in letting truth be decided by whoever can recruit a nastier mob, rather than such quaintly outmoded concepts such as evidence and rationality.

So there is a huge article space that Wikipedia is lacking in, and can never fill, because it's structurally incapable of doing so. It's also what keeps Wikipedia from credibly claiming to be an encyclopedia.

Peter Damian

QUOTE(Tarc @ Sat 29th October 2011, 3:27am)

Is that necessarily a bad thing, though? Take any topic, let's say the Beatles. Editors have gone there over the years, created articles on the band, albums, tours, controversies, places, and influences. A Beatles fan comes along post-2008, and finds most everything he can think of to be covered already. Same with others, from WWII to Lost to Lady Gaga, perhaps there is a saturation point where there just isn't much new content to generate, therefore less editor participation.

Of course there are current events to write about, from the legitimate (Libyan revolution) to the asinine (anything the ARSEholes promote...listeria outbreaks, crazy guys who let exotic animals loose, Rebecca Black clones, etc...) but that has always been the case.

Ah yes, this is the 'already contains the sum of human knowledge' argument.

http://ocham.blogspot.com/2011/10/crowdsou...philosophy.html

QUOTE(Kelly Martin @ Sat 29th October 2011, 5:04am)

So there is a huge article space that Wikipedia is lacking in, and can never fill, because it's structurally incapable of doing so. It's also what keeps Wikipedia from credibly claiming to be an encyclopedia.

Why aren't Wikipedians persuaded of this?

iii

QUOTE(Peter Damian @ Sat 29th October 2011, 6:12am)

QUOTE(Kelly Martin @ Sat 29th October 2011, 5:04am)

So there is a huge article space that Wikipedia is lacking in, and can never fill, because it's structurally incapable of doing so. It's also what keeps Wikipedia from credibly claiming to be an encyclopedia.

Why aren't Wikipedians persuaded of this?

Because it ruins the fantasy that the amateurs in charge are capable of making a quality encyclopedia.

thekohser

QUOTE(Peter Damian @ Sat 29th October 2011, 6:12am)

QUOTE(Kelly Martin @ Sat 29th October 2011, 5:04am)

So there is a huge article space that Wikipedia is lacking in, and can never fill, because it's structurally incapable of doing so. It's also what keeps Wikipedia from credibly claiming to be an encyclopedia.

Why aren't Wikipedians persuaded of this?

Because the things Wikipedians really care about (anti-Scientology, frenum rings, Pokemon, and episodes of Power Rangers) are pretty well filled in. "Their" encyclopedia is masterfully serving their needs, right now.

It's the blimp, Frank

QUOTE(communicat @ Sat 29th October 2011, 1:24pm)

Contrary to what is conspicuously omitted in the WP article, Gaddafi (Muammar al-Qathafi) inherited the poorest country in the world and turned it into one of the richest in Africa. He provided Libyans with literacy and a free education, and then paid for University grants. Ten per cent of Libyan students studied abroad, in Europe and the USA, paid by the Libyan state and with board and lodging included. What dictator educates his people?

He also gave each married couple 50,000 USD to settle down, he paid for half the first car, he provided interest-free bank loans, he provided free medical assistance, he built the world's most advanced irrigation system, bringing water to most of Libya, across the desert; he provided farmers with land, seeds, tools and instruction.

He inherited the poorest country in the world, nationalized its formerly Western controlled oil industry, and gave the country the highest Human Development Index in Africa. He helped free Africans from the yoke of imperialism and colonialism, he provided Africans with satellites to free them from crippling payments to western systems, he set up loans so that Africans would be freed of eternally paying interest to foreign banks. He paid revenue from oil directly into the bank accounts of the Libyan people.

And he made this famous statement:

QUOTE

We are content and happy if Obama can stay forever as the president of the United States of America.

http://thehill.com/blogs/blog-briefing-roo...esident-forever

Detective

QUOTE(Peter Damian @ Sat 29th October 2011, 11:12am)

QUOTE(Kelly Martin @ Sat 29th October 2011, 5:04am)

So there is a huge article space that Wikipedia is lacking in, and can never fill, because it's structurally incapable of doing so. It's also what keeps Wikipedia from credibly claiming to be an encyclopedia.

Why aren't Wikipedians persuaded of this?

Because the Wikipedians who matter don't give two shakes of a dead lamb's tail about its credibility as an encyclopedia. They know perfectly well that among those who know, Wikipedia can never have credibility. All they worry about is getting their own way, and basking in Wikipedia's Google rankings.

radek

QUOTE(Peter Damian @ Sat 29th October 2011, 5:12am)

QUOTE(Tarc @ Sat 29th October 2011, 3:27am)

Is that necessarily a bad thing, though? Take any topic, let's say the Beatles. Editors have gone there over the years, created articles on the band, albums, tours, controversies, places, and influences. A Beatles fan comes along post-2008, and finds most everything he can think of to be covered already. Same with others, from WWII to Lost to Lady Gaga, perhaps there is a saturation point where there just isn't much new content to generate, therefore less editor participation.

Of course there are current events to write about, from the legitimate (Libyan revolution) to the asinine (anything the ARSEholes promote...listeria outbreaks, crazy guys who let exotic animals loose, Rebecca Black clones, etc...) but that has always been the case.

Ah yes, this is the 'already contains the sum of human knowledge' argument.

http://ocham.blogspot.com/2011/10/crowdsou...philosophy.html

QUOTE(Kelly Martin @ Sat 29th October 2011, 5:04am)

So there is a huge article space that Wikipedia is lacking in, and can never fill, because it's structurally incapable of doing so. It's also what keeps Wikipedia from credibly claiming to be an encyclopedia.

Why aren't Wikipedians persuaded of this?

Well, this is just going to echo what some other people are saying/have said, but basically think of the word "encyclopedia" as a brand name. The hard way to establish a brand name - essentially "reputation capital" - is by providing a quality product consistently over a long period of time. But this takes lots of effort, organization, and basically... real work. So the easy way is to appropriate a brand name, in this particular case the word "encyclopedia". Then you get at least some of the benefits of the reputation (some, because not everyone is fooled), with much much much less real work.

And as long as you can maintain the veneer that the brand name is appropriate it can work. For awhile. Then the realization starts seeping in that the brand name might not be so reputable anymore. So you try and stave that off by publishing misleading and irrelevant (though, I think, essentially true) statistics about the number of articles in your "encyclopedia". You "cook the books" by including low quality (analogy: crappy, high risk "toxic assets") one or two sentence stubs along with whatever legitimate assets you actually posses. It looks good on paper and, at least temporarily you fool some people into accepting that you are what your brand name says. But long term, this kind of practice (in this case, substituting lots of low quality stubs and similar junk for real articles on topics, as Kelly says) is not sustainable. So yeah, that's why it's dying.

So why are Wikipedians putting up with this? Well, first, a lot of them are not - and they're expressing their dissatisfaction with basically the only real way that Wikipedia structures allow; by leaving the project. Which we see in the data above.

Why do the others remain? A few reasons.

First some people have an emotional stake in a particular sub topic. So their thinking is "yeah, this place sucks, but maybe I can at least make some small corner of it not suck. Let me tend to my own garden". Problem is that they are vastly outnumbered by all the crazies, psychopaths and power mongers. This has basically been my own justification of still doing something over there. But yeah, I'm coming more and more to realize how futile this is.

Aside from that, the smart but cynical ones realize it's all a sham but hope that the illusion can be sustained for a while longer. Or they they realize that they're probably not going to be part of the "project" for much longer. The average tenure of a Wikipedian is pretty low. So if your horizon is like one or two years, then who cares if eventually everyone realizes the whole thing is a joke, as long as for the immediate future you get to strut your stuff around. These people have a stake in actively promoting the illusion that the project is doing just fine so they LIKE all the goofy stubs, the inflated edit counts, and the SPAM that is more or less taking over in terms of article creation. (So I checked, apparently Dr. Blofeld is apparently not a bot. He is, however, still basically a spammer)

The other group is composed of those that, as we say around here, "have drunk the kool aid". But every year this group gets smaller and smaller. The typical profile of such an editor, from what I seen, is something like:
1. Get on Wikipedia. Talk a lot about how awesome it is.
2. Viciously attack anyone who disagrees that Wikipedia is awesome.
3. Retire after a year never to be heard from again.

Most of these are young kids who are just into current "one hit wonder band" but whose enthusiasm burns out pretty quickly. And as the illusion that "the project" is doing good gets harder and harder to sustain, there is less and less of these.

Finally there are some who basically devoted their lives to the project, for better or worse. I'm perfectly willing to entertain the notion that their original intentions were noble. They meant well, but not having had any kind of experience with this kind of thing, the whole thing became a Frankenstein monster governed and motivated by its own idiocies and whims. BUT, these folks have invested so much into the project that there's no way in the world that they're going to admit that it sucks, because that would mean that they wasted the last... five? six? years of their lives on a dead end.

In economics this is called the "Sunk cost fallacy". If you invest a lot upfront into what turns out to be a bad investment project which ends up causing continued negative profits, then the rational thing to do is to write off the upfront investment and look for better alternatives. It is NOT to continue pouring money into the bad investment on the slim chance that things will turn around or the irrational belief that things aren't as bad as they seem. Of course, in the real world people are not rational so they commit the "sunk cost fallacy" all the time - because they become emotionally attached to something that is a failure just because it is THEIR failure. Same thing here. Some of the "old guard" persist exactly for this reason.

Anyway, just trying to un-hijack the thread here.

radek

QUOTE(EricBarbour @ Fri 28th October 2011, 10:31pm)

QUOTE(radek @ Fri 28th October 2011, 7:43pm)

Right now on the main page it says "3,780,433 articles in English". Nah. If they're lucky they got 2,268,259. (and what number of these are long but still crap?).

Not even that many. I figure that maybe 300,000 are actually useful (206,000 are more than 15k bytes). The rest are garbage, stubs, disambiguation pages, etc. Many long articles are crap, and IMO most of those "List of"
things are utter useless bullshit.

This is what Wikipedians are like.
Even when you beat them over the head with facts, they deny and deny and deny.

Yeah that's why I said "if they're lucky". By my estimate disambiguation pages are about another 10% on top of the 60% that are stubs. So we're down to 30% of 3.87 mil that MAY be actual articles.

And I didn't mention this before but when I was collecting data on this, I was very generous about what was counted a "stub". Basically I counted only articles which were a) categorized as stubs or b) shorter than FOUR sentences. But just doing it I noticed a significant number of articles that were like 5 or 6 sentences that were not categorized as stubs. So that'd add another 5% or 10% or so, to that 60%.

I haven't done a sampling of the "List of" articles (which are mostly spam) but yes, it does seem like there's quite a number of these, just by clicking through the Random Article feature. (I can forsee them taking off the Random Article feature in some nearby future, as clicking that thing a few times really does show you very quickly how bad the "encyclopedia" really is. It takes like 20+ clicks before you actually arrive at an article that might be considered useful)

Above stub level though the 15k threshold is somewhat arbitrary. I usually use character count rather than k because it's easy to inflate k's by including all sorts of irrelevant and useless infoboxes and templates, as well as a bunch of crappy pictures. But usually, even many of the Britannica articles are not up to 15k.

It's sort of non-linear, threshold kind of thing: below, I dunno, 2k or 3k, they're all crap. Above that threshold size is no indicator of quality, which could still be atrocious even with lots of k. Hell, my favorite Wikipedia article of all time, on Wyandanch, New York, in its glory days used to clock in at 316k (I really wish I could somehow sneakily get FA status for that article and get it featured on the main page, but that'd be BAD!)

As far as stubs go, right now I only got a small sample. But guess what the vintage of a typical article which is a stub today is? In other words, how long ago was an article that is stub a today actually created? If you believe the nonsense that people here are saying, then it's perfectly fine to spam/mass create a shitload of crappy stubs (polite term is "seed") because supposedly "they get improved over time". Bullshit. The average stub that exists on Wikipedia right now was created as a stub and has remained a stub for... almost the past four years. With one or zero subsequent meaningful edits (it probably got recategorized over and over again by ABW users a bunch, and some bots might have formatted something). Those millions of stubs stay stubs. And this estimate - of almost 4 years - EXCLUDES the recent flurry of Blofeld (and a few others) stubs (including these would bias the estimate towards recent creation).

So yes, the share of stubs in "new articles created" has been going up over time. If these stubs were ever improved to a meaningful level (since more or less, all articles start as stubs) then that'd be fine. But they're not. They're scraping the bottom of the barrel. Contra Tarc, it's not because they have to. It's because they're lazy and stupid and because, per Kelly, writing actual meaningful articles takes a lot of work. But that's not what they're in it for.

EricBarbour

QUOTE(radek @ Sat 29th October 2011, 1:11pm)

First some people have an emotional stake in a particular sub topic. So their thinking is "yeah, this place sucks, but maybe I can at least make some small corner of it not suck. Let me tend to my own garden". Problem is that they are vastly outnumbered by all the crazies, psychopaths and power mongers.

Even the crazies and psychopaths with admin mops are starting to give up and walk away.

Remember Mr. Z-man? He's been scarce lately.
Remember Morwen, nerdgirl supreme and fixture at WP UK meets for years? She hasn't done anything since July.
Remember KnightLago? He was the Arbcom clerk. Consummate insider. He hasn't done anything since June.
Remember that nice young man Jake Wartenberg? A rabid Jimbo fan at one time. He quit in August.
JWSchmidt, a good contributor who wrote a lot of science articles, hasn't been active since April.

Those are only samples. Most of the others have cut back drastically on their work.
And the trend appears to be accelerating. They don't talk about it, they just don't
log in regularly anymore. And nobody desysops them, despite a lot of talk about
shutting off the accounts of inactive admins to keep them from being hijacked by
hackers. I am convinced this is a deliberate attempt to mislead the community.

QUOTE

The average tenure of a Wikipedian is pretty low.

I figure that possibly 60-75% of all Wikipedia accounts are socks that have never performed ANY edits.
Most of the accounts being created right now are being made with bots.
Why? Wargame! Entertainment! And this estimate could easily be low.

QUOTE

(So I checked, apparently Dr. Blofeld is apparently not a bot. He is, however, still basically a spammer)

No, but he does use multiple bots. And is so profoundly crazy, no one can reason with him.
If any Wikipedia fanboy is likely to end up in a straitjacket, I would expect him to be candidate #1.

QUOTE

In economics this is called the "Sunk cost fallacy". If you invest a lot upfront into what turns out to be a bad investment project which ends up causing continued negative profits, then the rational thing to do is to write off the upfront investment and look for better alternatives. It is NOT to continue pouring money into the bad investment on the slim chance that things will turn around or the irrational belief that things aren't as bad as they seem. Of course, in the real world people are not rational so they commit the "sunk cost fallacy" all the time - because they become emotionally attached to something that is a failure just because it is THEIR failure. Same thing here. Some of the "old guard" persist exactly for this reason.

Well put, just as an aside: the coverage of economics and related issues on Wikipedia is terrible.
Most of the articles I've seen have been short and incoherent. Many of them (such as Coase theorem (T-H-L-K-D))
were written and pwned by libertarian maniacs. That's another hidden "feature" of English Wikipedia:
it's the don't-tax-me-bro-pedia.....

radek

Update:

Ok, I just clicked "Random Article" 141 times. It took me about 10 minutes - a time span during which, apparently, someone like Dr. Blofeld (and a few others claim this as well) can "create" about 50 articles. Then I counted the number of these random articles which were disambig pages or "List of pages". I counted "March 02 1654" as "List of" articles and "Barons/Earls/Lords of some the Podunk Manor" type articles as "List of" as well.

It's a decent sample, the Central Limit Theorem should starts kicking at about 30, so it's a reasonable size.

Out of these 141 articles, 11.3% were disambig pages and 5% were "List of" pages.

The stubs were counted in the other category.

So we got:

Wikipedia articles:
60% definitely stubs
5% to 10% probably stubs
11% disambig pages
5% "List of" articles

So that's 81% to 86% of these 3.7 million Wikipedia articles. The other 19% to 14% are:

Not stubs
Not disambig pages
Not "List of articles"

but of course a lot of them could be Wyandach, New York type stuff, puff pieces for weird religious cults, hoaxes, tributes to some tin-pot Stalinist dictator, Kohs' paid for articles (NTTAWWT) and all the rest.

radek

QUOTE(EricBarbour @ Sat 29th October 2011, 3:58pm)

QUOTE(radek @ Sat 29th October 2011, 1:11pm)

First some people have an emotional stake in a particular sub topic. So their thinking is "yeah, this place sucks, but maybe I can at least make some small corner of it not suck. Let me tend to my own garden". Problem is that they are vastly outnumbered by all the crazies, psychopaths and power mongers.

Even the crazies and psychopaths with admin mops are starting to give up and walk away.

Remember Mr. Z-man? He's been scarce lately.
Remember Morwen, nerdgirl supreme and fixture at WP UK meets for years? She hasn't done anything since July.
Remember KnightLago? He was the Arbcom clerk. Consummate insider. He hasn't done anything since June.
Remember that nice young man Jake Wartenberg? A rabid Jimbo fan at one time. He quit in August.
JWSchmidt, a good contributor who wrote a lot of science articles, hasn't been active since April.

Those are only samples. Most of the others have cut back drastically on their work.
And the trend appears to be accelerating. They don't talk about it, they just don't
log in regularly anymore. And nobody desysops them, despite a lot of talk about
shutting off the accounts of inactive admins to keep them from being hijacked by
hackers. I am convinced this is a deliberate attempt to mislead the community.

QUOTE

The average tenure of a Wikipedian is pretty low.

I figure that possibly 60-75% of all Wikipedia accounts are socks that have never performed ANY edits.
Most of the accounts being created right now are being made with bots.
Why? Wargame! Entertainment! And this estimate could easily be low.

QUOTE

(So I checked, apparently Dr. Blofeld is apparently not a bot. He is, however, still basically a spammer)

No, but he does use multiple bots. And is so profoundly crazy, no one can reason with him.
If any Wikipedia fanboy is likely to end up in a straitjacket, I would expect him to be candidate #1.

QUOTE

In economics this is called the "Sunk cost fallacy". If you invest a lot upfront into what turns out to be a bad investment project which ends up causing continued negative profits, then the rational thing to do is to write off the upfront investment and look for better alternatives. It is NOT to continue pouring money into the bad investment on the slim chance that things will turn around or the irrational belief that things aren't as bad as they seem. Of course, in the real world people are not rational so they commit the "sunk cost fallacy" all the time - because they become emotionally attached to something that is a failure just because it is THEIR failure. Same thing here. Some of the "old guard" persist exactly for this reason.

Well put, just as an aside: the coverage of economics and related issues on Wikipedia is terrible.
Most of the articles I've seen have been short and incoherent. Many of them (such as Coase theorem (T-H-L-K-D))
were written and pwned by libertarian maniacs. That's another hidden "feature" of English Wikipedia:
it's the don't-tax-me-bro-pedia.....

On economics you get it from both... well, three ends. The libertarian/Austrian school is very active. So is the Marxist/Heterodox school. This would actually be fine if they stuck to articles which are really relevant to their topics. But - given Wikipedia's nature, somewhat understandably - both these groups basically try to go through and insert links/irrelevant info that they like into any economics related article they can find. So even articles on purely technical, non-controversial topics end up being overwhelmed with (either) the Austrian or Heterodox stuff. The third component is that you just get lots of crazies along the lines of "the key to economic development is transcendental meditation" or something and it's all just too much.

In this particular case, the info that is missing is that the Coase Theorem is not even a Theorem.

EricBarbour

QUOTE(radek @ Sat 29th October 2011, 2:24pm)

Wikipedia articles:
60% definitely stubs
5% to 10% probably stubs
11% disambig pages
5% "List of" articles

So that's 81% to 86% of these 3.7 million Wikipedia articles. The other 19% to 14% are:

Not stubs
Not disambig pages
Not "List of articles"

That's about what I've seen in my tests. The stub percentage is increasing faster than the good articles,
because the stubs are being generated automatically and real article writing is fading slowly out.

And everybody is lying about it. Most of the statistical articles about WP's size and balance are
hopelessly out of date, and rarely mention the stub percentages.

(Did you know that Wikiproject Stub sorting, which was once a major and very busy project,
started dying off in 2011?)

Peter Damian

The decline in editing (as opposed to decline in editors) is the one that hits me. I have been looking at edit counts per page for each of the 740-odd admins, which is another interesting statistic but more about that later. The toolserver application that does this also (usually) reports editing frequency.

What leapt out after a number of trials was the huge decline in number of edits for nearly all of the current administrators. Our old friend Fetchcomms is typical.

http://toolserver.org/~soxred93/pcount/ind...&wiki=wikipedia

Agree with all the stuff said here about stubs, random articles and so on.

timbo

I was intrigued by a post above sampling WP contents via the RANDOM ARTICLE link. However, rather than using some arbitrarily determined size demarcation to differentiate between empty space and substance, I used my eyes.

One knows a stub when one sees one it, right?

So I hit the RANDOM ARTICLE link 100 times during football this afternoon and charted the results.

All of these are raw numbers and also percentages for my sample...

Useful articles = 51

Stubs and Stubbish articles = 36

Disambiguation Pages = 9

Lists =4

===========

For the "useful articles," subcategories include

Biographical = 12

Sports topics or biographies = 8

Historical topics = 7

Geographical Articles = 5

Commercial Entities = 5

Pop Cultural topics and biographies = 4

Military History = 3

Scientific/Zoological = 2

Geology = 1

Economics = 1

Medical = 1

High Culture = 1

Contemprary books = 1

============

Most of the stub articles were either geographical (which can be useful) or scientific (insect species, etc.).

In short, the breathless analysis that Wikipedia is degenerating into a vast mass of bot-generated stubs seems absolutely incorrect.

Don't believe me?

Hit the RANDOM ARTICLE 100 times and chart your findings.

I dare you.

t

radek

QUOTE(timbo @ Sun 30th October 2011, 2:25pm)

I was intrigued by a post above sampling WP contents via the RANDOM ARTICLE link. However, rather than using some arbitrarily determined size demarcation to differentiate between empty space and substance, I used my eyes.

One knows a stub when one sees one it, right?

So I hit the RANDOM ARTICLE link 100 times during football this afternoon and charted the results.

All of these are raw numbers and also percentages for my sample...

Useful articles = 51

Stubs and Stubbish articles = 36

Disambiguation Pages = 9

Lists =4

===========

For the "useful articles," subcategories include

Biographical = 12

Sports topics or biographies = 8

Historical topics = 7

Geographical Articles = 5

Commercial Entities = 5

Pop Cultural topics and biographies = 4

Military History = 3

Scientific/Zoological = 2

Geology = 1

Economics = 1

Medical = 1

High Culture = 1

Contemprary books = 1

============

Most of the stub articles were either geographical (which can be useful) or scientific (insect species, etc.).

In short, the breathless analysis that Wikipedia is degenerating into a vast mass of bot-generated stubs seems absolutely incorrect.

Don't believe me?

Hit the RANDOM ARTICLE 100 times and chart your findings.

I dare you.

t

Your numbers on disambig pages and lists make sense. However, your numbers on stubs do not. I don't know how you got it, but like I said above, I've already taken quite a large sample of articles - quite a bit more than 100. And stubs are consistently about 60% of all articles (as you sample it the % varies between something like 58% and 72%). I'm guessing you're counting some stubs as "useful" according to your own criteria. But in this kind of exercise the criteria should be well defined and precise. Otherwise this is just a more fancy way of "making shit up".

The last batch of sampling had 102 articles, of which 61, or 59.8% were stubs. I just clicked it 100 times, like you did, and got this (# of stubs out of each ten):

7
4
6
5
7
8
8
6
4
6

so that's 61 out of 100 that's stubs.

Like I said...

timbo

QUOTE(radek @ Sun 30th October 2011, 12:42pm)

Your numbers on disambig pages and lists make sense. However, your numbers on stubs do not. I don't know how you got it, but like I said above, I've already taken quite a large sample of articles - quite a bit more than 100. And stubs are consistently about 60% of all articles (as you sample it the % varies between something like 58% and 72%). I'm guessing you're counting some stubs as "useful" according to your own criteria. But in this kind of exercise the criteria should be well defined and precise. Otherwise this is just a more fancy way of "making shit up".

You are right -- how one counts "stubs and stubbish articles" makes an enormous difference. If one defines this as an arbitrary file size, the number selected determines everything. Why 15K? Why not 5K?

Here's how I define a stub:

(1) An article marked as a stub.

(2) An article of no more than 2 or 3 paragraphs without footnotes.

A sourced 3 paragraph article I consider a "useful" piece -- in that a person searching the term would receive a more or less satisfactory positive response to their inquiry.

Some stubs -- little geographic snippets -- are perfectly satisfactory positive responses to queries about those locations.

If one wants to prove the thesis that Wikipedia is submerged in stubs, one arbitrarily sets the size for "real" articles larger rather than smaller.

If one wants to prove the thesis that Wikipedia is in perfectly fine shape, one just has to take a reasoned, subjective look at what are true "stubs" and what are valid sourced short articles.

You see the glass as only 35% full; I see it as about 35% empty.

tim

Guido den Broeder

Note that the random number generator that the software uses is very poor, with high autocorrelation (and note that no article on Wikipedia tells you anything about the quality of RNGs).

A proper sample should therefore contain far more than 100 articles.

Kelly Martin

QUOTE(Guido den Broeder @ Sun 30th October 2011, 8:27pm)

Note that the random number generator that the software uses is very poor, with high autocorrelation (and note that no article on Wikipedia tells you anything about the quality of RNGs).

In addition, the randomization process that MediaWiki uses results in a nonuniform distribution: some articles will be returned *far* more often than other articles.

Each page, when initially created, is assigned a random number between 0 and 1. The "random page" function works by generating another random number between 0 and 1, and finding the page whose random number is the least such number greater than the random number just created. Because the random numbers assigned to pages are random, rather than uniform, some pages will be returned more than others. This is in addition to the fairly poor PRNG that PHP uses.

From a statistical sampling standpoint this bias can probably be ignored because it's almost certainly uncorrelated to any metric of actual interest. The only exception would be that I would expect a slightly higher view and edit rates for pages with greater random exposure, on the grounds that they'll get slightly more views, from people using "random page", but I suspect this effect is extremely small.

radek

QUOTE(Guido den Broeder @ Sun 30th October 2011, 8:27pm)

Note that the random number generator that the software uses is very poor, with high autocorrelation (and note that no article on Wikipedia tells you anything about the quality of RNGs).

A proper sample should therefore contain far more than 100 articles.

I've been wondering about that! One thing I noticed while clicking through the thing is that if you get an article named "xxx blah blah blah blah ARCTIC blah blah blah xxx" then quite often the next article you get is named "blah xxx blah blah xxx ARCTIC blah xxx blah". Which does suggest some weird autocorrelation in the algorithm. So it might very well be that if you click ten articles and only 4 of them show up as stubs, then it is quite likely that out of the next 10 the most likely value is 4 as well.

Any idea on how the Random Article feature actually works? Like what's the script or the random number generator underlying it?

Having said that, my sample is pretty close to 1000 now and the 60% of articles being stub is fairly robust so I'm pretty sure it's representative.

QUOTE(Kelly Martin @ Sun 30th October 2011, 8:44pm)

QUOTE(Guido den Broeder @ Sun 30th October 2011, 8:27pm)

Note that the random number generator that the software uses is very poor, with high autocorrelation (and note that no article on Wikipedia tells you anything about the quality of RNGs).

In addition, the randomization process that MediaWiki uses results in a nonuniform distribution: some articles will be returned *far* more often than other articles.

Each page, when initially created, is assigned a random number between 0 and 1. The "random page" function works by generating another random number between 0 and 1, and finding the page whose random number is the least such number greater than the random number just created. Because the random numbers assigned to pages are random, rather than uniform, some pages will be returned more than others. This is in addition to the fairly poor PRNG that PHP uses.

From a statistical sampling standpoint this bias can probably be ignored because it's almost certainly uncorrelated to any metric of actual interest. The only exception would be that I would expect a slightly higher view and edit rates for pages with greater random exposure, on the grounds that they'll get slightly more views, from people using "random page", but I suspect this effect is extremely small.

Yeah, this is why I invoked the Central Limit Theorem above - with sufficiently high sample (which isn't that high, 30 or so) the underlying distribution of the randomization process shouldn't matter, uniform or not. But autocorrelation CAN.

Silver seren

I felt like trying this too with 100 articles. The Not Sure's are ones that are in between a stub and not being a stub (around 2-3 paragraphs long). I was rather surprised at how many of the biographies were not contemporary people, but actually historic figures and such.

Stub - 29

Not stub - 46

Disambiguation - 8

List - 6

Not sure - 10

Good article - 1

Types:

Places - 20

Living things - 4

Television/Film/ect. - 1

Company - 4

Music - 9

Biography - 27

Dance - 1

Politics - 4

Transportation - 1

Sports - 2

Conflict - 1

Buildings/Monuments/ect. - 2

Space - 2

Education - 1

Television/Radio/ect. Station - 2

Books/Magazines/ect. - 2

Dates - 1

Laws - 1

Newspapers - 1

Board games - 1

Internet - 1

Ships - 1

Weather - 2

Genetics - 1

Kelly Martin

QUOTE(radek @ Sun 30th October 2011, 9:00pm)

\Yeah, this is why I invoked the Central Limit Theorem above - with sufficiently high sample (which isn't that high, 30 or so) the underlying distribution of the randomization process shouldn't matter, uniform or not. But autocorrelation CAN.

I would not expect there to be any observable autocorrelation in a sequence of invocations of Special:Random. Each request is going to be processed by a new instance of the PHP engine, with a new random seed initialized from system entropy (probably the system clock) at the start of the request. The random number generator is reseeded for each request, rather than being maintained and reused, as far as I know. Also, because of the large number of webservers in use at the Wikimedia farm, any given request is unlikely to be serviced by the same webserver that serviced a previous request, and the random seed is definitely not stored in portable session data. There is no call to srand (or mt_srand) in MediaWiki's code, at least not in code normally used during routine operations (there are calls to srand in certain maintenance and test procedures, but those are not used in normal operations).

radek

QUOTE(Silver seren @ Sun 30th October 2011, 9:32pm)

I felt like trying this too with 100 articles. The Not Sure's are ones that are in between a stub and not being a stub (around 2-3 paragraphs long). I was rather surprised at how many of the biographies were not contemporary people, but actually historic figures and such.

Stub - 29

Not stub - 46

Disambiguation - 8

List - 6

Not sure - 10

Good article - 1

Ignoring disambigs, lists, and your "not sure" we get 29/47=61.7% which is again close to 60% stubs

Silver seren

QUOTE(radek @ Mon 31st October 2011, 2:44am)

QUOTE(Silver seren @ Sun 30th October 2011, 9:32pm)

I felt like trying this too with 100 articles. The Not Sure's are ones that are in between a stub and not being a stub (around 2-3 paragraphs long). I was rather surprised at how many of the biographies were not contemporary people, but actually historic figures and such.

Stub - 29

Not stub - 46

Disambiguation - 8

List - 6

Not sure - 10

Good article - 1

Ignoring disambigs, lists, and your "not sure" we get 29/47=61.7% which is again close to 60% stubs

Um...I think you have it backwards. It would be 40% stubs and 60% not stubs.

radek

QUOTE(Silver seren @ Sun 30th October 2011, 10:05pm)

QUOTE(radek @ Mon 31st October 2011, 2:44am)

QUOTE(Silver seren @ Sun 30th October 2011, 9:32pm)

I felt like trying this too with 100 articles. The Not Sure's are ones that are in between a stub and not being a stub (around 2-3 paragraphs long). I was rather surprised at how many of the biographies were not contemporary people, but actually historic figures and such.

Stub - 29

Not stub - 46

Disambiguation - 8

List - 6

Not sure - 10

Good article - 1

You're right, take it as stubs/not subs+not sure. Same thing.
Ignoring disambigs, lists, and your "not sure" we get 29/47=61.7% which is again close to 60% stubs

Um...I think you have it backwards. It would be 40% stubs and 60% not stubs.

You're right, but take it as stubs/not stubs + not sure. Same thing.

Guido den Broeder

QUOTE(Kelly Martin @ Mon 31st October 2011, 3:37am)

QUOTE(radek @ Sun 30th October 2011, 9:00pm)

\Yeah, this is why I invoked the Central Limit Theorem above - with sufficiently high sample (which isn't that high, 30 or so) the underlying distribution of the randomization process shouldn't matter, uniform or not. But autocorrelation CAN.

I would not expect there to be any observable autocorrelation in a sequence of invocations of Special:Random. Each request is going to be processed by a new instance of the PHP engine, with a new random seed initialized from system entropy (probably the system clock) at the start of the request. The random number generator is reseeded for each request, rather than being maintained and reused, as far as I know. Also, because of the large number of webservers in use at the Wikimedia farm, any given request is unlikely to be serviced by the same webserver that serviced a previous request, and the random seed is definitely not stored in portable session data. There is no call to srand (or mt_srand) in MediaWiki's code, at least not in code normally used during routine operations (there are calls to srand in certain maintenance and test procedures, but those are not used in normal operations).

I didn't expect it either, but it is clearly observable. Could the cache interfere with the reseeding?

Kelly Martin

QUOTE(Guido den Broeder @ Mon 31st October 2011, 6:41am)

I didn't expect it either, but it is clearly observable. Could the cache interfere with the reseeding?

Special page results are uncacheable, so that shouldn't be happening. The only thing that makes sense is that the squids are using session affinity to send you back to the same webserver for each request, and the servers are preserving the random seed instead of reseeding. Wikimedia is running an accelerated version of PHP; it's possible that the engine doesn't reseed at the start of *every* script run, and is instead persisting the random seed between sessions. I looked around the web last night for some information on how the random seed is managed when PHP is embedded in a webserver, but couldn't find anything.

I still don't understand why they don't just periodically reseed from /dev/random. /dev/random is a fairly high quality entropy source, and on a system with a high level of network activity the rate at which entropy is created is fairly high, so there should be enough entropy available to reseed fairly often.

radek

QUOTE(radek @ Sun 30th October 2011, 10:15pm)

QUOTE(Silver seren @ Sun 30th October 2011, 10:05pm)

QUOTE(radek @ Mon 31st October 2011, 2:44am)

QUOTE(Silver seren @ Sun 30th October 2011, 9:32pm)

I felt like trying this too with 100 articles. The Not Sure's are ones that are in between a stub and not being a stub (around 2-3 paragraphs long). I was rather surprised at how many of the biographies were not contemporary people, but actually historic figures and such.

Stub - 29

Not stub - 46

Disambiguation - 8

List - 6

Not sure - 10

Good article - 1

You're right, take it as stubs/not subs+not sure. Same thing.
Ignoring disambigs, lists, and your "not sure" we get 29/47=61.7% which is again close to 60% stubs

Um...I think you have it backwards. It would be 40% stubs and 60% not stubs.

You're right, but take it as stubs/not stubs + not sure. Same thing.

Actually you're right. In fact I woke up this morning thinking "Silver seren is right", strangely enough, which is weird first thought to have upon waking up. Anyway, the difference is that you're taking stubs out of ALL articles while I'm talking about stubs out of non-disamig, non-list articles. So really % of Wikipedia articles that are stubs should be something like 60%*85% (with the other 15% disambig and lists) so 51%