Help - Search - Members - Calendar
Full Version: What really is in Wikipedia?
> Wikimedia Discussion > General Discussion
Peter Damian
I confront the following paradox: according to the statistics, Wikipedia is a huge enormous encyclopedia, with over x million articles, the largest repository of human knowledge &c. But when I look in Wikipedia for any subject I actually have detailed knowledge about, it actually contains very little in comparison to my standard reference sources. How do we explain this paradox?

I suspect because it contains a lot of rubbish. But how would I go about investigating this? Is there any way of querying the database to give a list of articles, for example? Can I use the category system to list all biographies and then determine what percentage are of living people? Could I go further to determine how much space is devoted to people born before 1900 as opposed to afterwards? What I am trying to get are some hard facts and figures to support anecdotal evidence that Wikipedia is heavily weighted towards 'pop culture', personalities and celebrities who are living now, material which has been produced very recently and so on.
thekohser
QUOTE(Peter Damian @ Mon 20th July 2009, 11:17am) *

I confront the following paradox: according to the statistics, Wikipedia is a huge enormous encyclopedia, with over x million articles, the largest repository of human knowledge &c. But when I look in Wikipedia for any subject I actually have detailed knowledge about, it actually contains very little in comparison to my standard reference sources. How do we explain this paradox?

I suspect because it contains a lot of rubbish. But how would I go about investigating this? Is there any way of querying the database to give a list of articles, for example? Can I use the category system to list all biographies and then determine what percentage are of living people? Could I go further to determine how much space is devoted to people born before 1900 as opposed to afterwards? What I am trying to get are some hard facts and figures to support anecdotal evidence that Wikipedia is heavily weighted towards 'pop culture', personalities and celebrities who are living now, material which has been produced very recently and so on.


You would be wasting your time, Peter Damian, seeking such hard data to prove something that (surely) has already been proven by some university or Wikipedia committee of some kind. We know that enormous numbers of Wikipedia articles are consumed by things like TV show episodes, actors, asteroids, species of plants and various smaller-than-a-mouse animals, towns and villages, and recent events galore.

Your paradox is probably accentuated by the relatively rare air you're breathing in medieval philosophy. Think of 100 people at a soccer match. How many of them have ever visited a Wikipedia page about medieval philosophy? I would guess 2. How many of them have visited Wikipedia looking for more info about Megan Fox? I would guess 10. So, there is a 4-times likelihood that content has been built up around subjects like Megan Fox (which you would probably term "rubbish") than around subjects like the Sum of Logic.
Eva Destruction
QUOTE(Peter Damian @ Mon 20th July 2009, 4:17pm) *

I suspect because it contains a lot of rubbish. But how would I go about investigating this? Is there any way of querying the database to give a list of articles, for example? Can I use the category system to list all biographies and then determine what percentage are of living people? Could I go further to determine how much space is devoted to people born before 1900 as opposed to afterwards? What I am trying to get are some hard facts and figures to support anecdotal evidence that Wikipedia is heavily weighted towards 'pop culture', personalities and celebrities who are living now, material which has been produced very recently and so on.

Provided the articles are correctly categorized, then yes; each category lists the current number of members at the top. You'd need to do some number crunching to add together the assorted subcategories, though (e.g., the total number of 19th century biographies = Category:1801 births + Category:1802 births…). AnomieBOT (T-C-L-K-R-D) should calculate the totals for you automatically if you provide a list of the relevant categories on the bot's talkpage (it can handle "all subcategories" requests).

To answer your other question, there are (at the time of writing) 717,878 biographies on Wikipedia, and 392,424 BLPs. (The percentage isn't as straightforward as it looks, as some articles categorized as BLPs aren't true biographies, such as articles on bands where the band members are alive.)
A Horse With No Name
QUOTE(thekohser @ Mon 20th July 2009, 11:26am) *
How many of them have visited Wikipedia looking for more info about Megan Fox? I would guess 10.


Oh? Who are the other nine guys? evilgrin.gif
Peter Damian
QUOTE(Eva Destruction @ Mon 20th July 2009, 4:26pm) *

Provided the articles are correctly categorized, then yes; each category lists the current number of members at the top. You'd need to do some number crunching to add together the assorted subcategories, though (e.g., the total number of 19th century biographies = Category:1801 births + Category:1802 births…). AnomieBOT (T-C-L-K-R-D) should calculate the totals for you automatically if you provide a list of the relevant categories on the bot's talkpage (it can handle "all subcategories" requests).

To answer your other question, there are (at the time of writing) 717,878 biographies on Wikipedia, and 392,424 BLPs. (The percentage isn't as straightforward as it looks, as some articles categorized as BLPs aren't true biographies, such as articles on bands where the band members are alive.)


That's excellent thanks. Presumably there's no way out of doing the '1801, 1802...' thing 100 times?
Eva Destruction
QUOTE(Peter Damian @ Mon 20th July 2009, 4:33pm) *

QUOTE(Eva Destruction @ Mon 20th July 2009, 4:26pm) *

Provided the articles are correctly categorized, then yes; each category lists the current number of members at the top. You'd need to do some number crunching to add together the assorted subcategories, though (e.g., the total number of 19th century biographies = Category:1801 births + Category:1802 births…). AnomieBOT (T-C-L-K-R-D) should calculate the totals for you automatically if you provide a list of the relevant categories on the bot's talkpage (it can handle "all subcategories" requests).

To answer your other question, there are (at the time of writing) 717,878 biographies on Wikipedia, and 392,424 BLPs. (The percentage isn't as straightforward as it looks, as some articles categorized as BLPs aren't true biographies, such as articles on bands where the band members are alive.)


That's excellent thanks. Presumably there's no way out of doing the '1801, 1802...' thing 100 times?

None that I'm aware of, other than getting a bot to go through the subcategories of Category:1800s births. The whole thing is complicated further by Category:Year of birth unknown, but adding the totals should give a rough ballpark figure.

In addition to Greg's comments above, it's also worth noting that any reference work is going to have a disproportionate number of 20th century biographies, because not only were there a lot more people, but record-keeping became systematic for the first time. (The names and biographies of most actors of classical Athens aren't recorded even if someone wanted to list them; the names and biographies of everyone who ever appeared on The Love Boat can be checked and verified.)
Peter Damian
QUOTE(thekohser @ Mon 20th July 2009, 4:26pm) *


You would be wasting your time, Peter Damian, seeking such hard data to prove something that (surely) has already been proven by some university or Wikipedia committee of some kind. We know that enormous numbers of Wikipedia articles are consumed by things like TV show episodes, actors, asteroids, species of plants and various smaller-than-a-mouse animals, towns and villages, and recent events galore.

Your paradox is probably accentuated by the relatively rare air you're breathing in medieval philosophy. Think of 100 people at a soccer match. How many of them have ever visited a Wikipedia page about medieval philosophy? I would guess 2. How many of them have visited Wikipedia looking for more info about Megan Fox? I would guess 10. So, there is a 4-times likelihood that content has been built up around subjects like Megan Fox (which you would probably term "rubbish") than around subjects like the Sum of Logic.


This is not entirely true and if it were I would have no concern about Wikipedia, i.e. if it were perceived as a source of trivia. But I carefully look at the hit rates of the articles I am involved with and while they are not quite in the same league as Britney Spears, some of them are very popular. However the attention they get is minimal compared with Spears (I don't know who Megan Fox is).

So, I am saying that on the 'buy side' there is a lot more interest from the general public in what we would call 'encyclopedic subjects' whereas on the sell side I suspect there is much less. This at a guess is because Wikipedia is largely written by teenagers.
Cedric
Use the "Random article" link to do a sampling. I did a thread on this a long time ago.
Peter Damian
QUOTE(Eva Destruction @ Mon 20th July 2009, 4:41pm) *

[... ]it's also worth noting that any reference work is going to have a disproportionate number of 20th century biographies, because not only were there a lot more people, but record-keeping became systematic for the first time. (The names and biographies of most actors of classical Athens aren't recorded even if someone wanted to list them; the names and biographies of everyone who ever appeared on The Love Boat can be checked and verified.)


Agree but I want to compare the treatment with standard reference works, not with reality. I did a similar exercise with philosophers and the Wikipedia treatment was entirely disproportionate i.e. Ayn Rand was up with Aristotle whereas standard reference works do not even mention Rand.
sbrown
QUOTE(thekohser @ Mon 20th July 2009, 4:26pm) *

Think of 100 people at a soccer match. How many of them have ever visited a Wikipedia page about medieval philosophy? I would guess 2. How many of them have visited Wikipedia looking for more info about Megan Fox? I would guess 10. So, there is a 4-times likelihood that content has been built up around subjects like Megan Fox (which you would probably term "rubbish") than around subjects like the Sum of Logic.

Er ... 10/2 = 5 surely?

QUOTE(Peter Damian @ Mon 20th July 2009, 8:43pm) *

The exception was the strange omission of Richard Flavell - a distinguished British molecular biologist whose entry in Chambers' is as generous as Flaubert's but who is not in Wikipedia at all

You know what wikidiots would say - SOFIXIT.
dtobias
QUOTE(Peter Damian @ Mon 20th July 2009, 11:33am) *

That's excellent thanks. Presumably there's no way out of doing the '1801, 1802...' thing 100 times?


Write a Perl script!
thekohser
QUOTE(sbrown @ Mon 20th July 2009, 5:13pm) *

Er ... 10/2 = 5 surely?


Ooph! I was thinking along the lines of (10-2)/2 = 4 ... so, something like "the horn-dog surplus alone is four times greater than the medieval philosopher types".
JohnA
When you have an actual working definition of what constitutes "knowledge" and what constitutes "trivia" then you'll have a handle on how much junk Wikipedia actually has.
Milton Roe
QUOTE(JohnA @ Tue 21st July 2009, 3:11pm) *

When you have an actual working definition of what constitutes "knowledge" and what constitutes "trivia" then you'll have a handle on how much junk Wikipedia actually has.

shrug.gif

Oh, ooooo, can I try this one?

Knowledge: information I'm interested in.
Trivia: information I'm NOT interested in.


Your division may differ, of course huh.gif unsure.gif wink.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.