> Content contributors

Peter Damian

My blog post for today http://ocham.blogspot.com/2011/10/repetiti...-wikipedia.html on whether there are statistically measurable properties that distinguish 'content contributors' from wiki-gnomes. Conclusion: the statistical difference is strongly indicative of a real difference, discussed in detail on the blog.

Remaining questions: why do content contributors remain on the project, given that they have a lower status than those who perform repetitive and tedious work?

Easily-learned repetitive labour is nearly always paid less in real life than labour which requires either specialised learning, or some innate but scarce skill. The simple reason for this is supply and demand. Rare or difficult-to-acquire skills are by definition in short supply, and will attract a higher price than common, easily acquired skills (at least, to my simple mind - I don't know any economics).

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.

SB_Johnny

QUOTE(Peter Damian @ Sun 30th October 2011, 9:01am)

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.

To be fair, I suspect at least part of that is because administrative buttons aren't really all that useful for content creators. The pool of people who have endless hours to engage in wikipolitics and chase vandals aren't necessarily the ones who have a deep background from which to contribute to actual encyclopedia-building.

Ottava

QUOTE(SB_Johnny @ Sun 30th October 2011, 9:15am)

QUOTE(Peter Damian @ Sun 30th October 2011, 9:01am)

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.

To be fair, I suspect at least part of that is because administrative buttons aren't really all that useful for content creators. The pool of people who have endless hours to engage in wikipolitics and chase vandals aren't necessarily the ones who have a deep background from which to contribute to actual encyclopedia-building.

You'd be surprised. Editing protected pages, history merges, moving over redirects, suppressing redirects, etc., are all extremely valuable to editing content.

Peter Damian

QUOTE(SB_Johnny @ Sun 30th October 2011, 1:15pm)

QUOTE(Peter Damian @ Sun 30th October 2011, 9:01am)

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.

To be fair, I suspect at least part of that is because administrative buttons aren't really all that useful for content creators. The pool of people who have endless hours to engage in wikipolitics and chase vandals aren't necessarily the ones who have a deep background from which to contribute to actual encyclopedia-building.

I didn't understand the 'to be fair' bit.

communicat

Peter/Edward, don't know if you've come across this, written by a fellow logician. It might or might not answer some of your questions, and it provides some useful references.
http://knol.google.com/k/carl-hewitt/corru...ip_by_Wikipedia

Peter Damian

QUOTE(communicat @ Sun 30th October 2011, 4:19pm)

Peter/Edward, don't know if you've come across this, written by a fellow logician. It might or might not answer some of your questions, and it provides some useful references.
http://knol.google.com/k/carl-hewitt/corru...ip_by_Wikipedia

Thanks, but yes, actually I am familiar with that one.

radek

QUOTE(Peter Damian @ Sun 30th October 2011, 8:01am)

My blog post for today http://ocham.blogspot.com/2011/10/repetiti...-wikipedia.html on whether there are statistically measurable properties that distinguish 'content contributors' from wiki-gnomes. Conclusion: the statistical difference is strongly indicative of a real difference, discussed in detail on the blog.

Remaining questions: why do content contributors remain on the project, given that they have a lower status than those who perform repetitive and tedious work?

Easily-learned repetitive labour is nearly always paid less in real life than labour which requires either specialised learning, or some innate but scarce skill. The simple reason for this is supply and demand. Rare or difficult-to-acquire skills are by definition in short supply, and will attract a higher price than common, easily acquired skills (at least, to my simple mind - I don't know any economics).

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.

I think you already answered your own question - supply and demand, while always present, lead to the outcomes you describe (higher paid for scarcer skills) only in functioning markets. Wikipedia is not a market. So rewards are not necessarily related to productivity or usefulness but rather determined through a messy social and political process (who's got what friends).

QUOTE(Ottava @ Sun 30th October 2011, 8:17am)

QUOTE(SB_Johnny @ Sun 30th October 2011, 9:15am)

QUOTE(Peter Damian @ Sun 30th October 2011, 9:01am)

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.

To be fair, I suspect at least part of that is because administrative buttons aren't really all that useful for content creators. The pool of people who have endless hours to engage in wikipolitics and chase vandals aren't necessarily the ones who have a deep background from which to contribute to actual encyclopedia-building.

You'd be surprised. Editing protected pages, history merges, moving over redirects, suppressing redirects, etc., are all extremely valuable to editing content.

Eh, they may be useful but are neither necessary nor even "extremely valuable".

Peter Damian

QUOTE(radek @ Sun 30th October 2011, 4:31pm)

Wikipedia is not a market.

That's interesting because analysis of Wales' early posts to the lists in 2001 suggests a market economy was exactly what he had in mind. That's why he was so heavy on not biasing the outcome by having content committees or editors in chief and so on.

QUOTE

QUOTE(Ottava @ Sun 30th October 2011, 8:17am)

You'd be surprised. Editing protected pages, history merges, moving over redirects, suppressing redirects, etc., are all extremely valuable to editing content.

Eh, they may be useful but are neither necessary nor even "extremely valuable".

Ottava is not given to irony, but I assumed he was here.

radek

QUOTE(Peter Damian @ Sun 30th October 2011, 8:01am)

My blog post for today http://ocham.blogspot.com/2011/10/repetiti...-wikipedia.html on whether there are statistically measurable properties that distinguish 'content contributors' from wiki-gnomes. Conclusion: the statistical difference is strongly indicative of a real difference, discussed in detail on the blog.

Remaining questions: why do content contributors remain on the project, given that they have a lower status than those who perform repetitive and tedious work?

Easily-learned repetitive labour is nearly always paid less in real life than labour which requires either specialised learning, or some innate but scarce skill. The simple reason for this is supply and demand. Rare or difficult-to-acquire skills are by definition in short supply, and will attract a higher price than common, easily acquired skills (at least, to my simple mind - I don't know any economics).

So why is the situation apparently reversed on Wikipedia? The statistics suggest that the majority of administrators use these low-value skills like vandal reversion, template adding, linking to the Estonian Wikipedia etc. Yet their status on Wikipedia is high, whereas that of 'content contributors' is low.

Oh yeah Peter, one thing. Your methodology will overestimate "content creation" by admins for ones who hang out mostly at AN/I and AE. More precisely, their high edits per page will come from them posting frequently to these drama boards, rather than working on articles.

There's probably some bias on the other end too. Someone like Piotrus has a edit per page number of 3.75, which is somewhere in the middle. But that's because the guy creates LOTS of pages and works a LOT on each of them. So there you'd have to control for total number of edits.

Actually, I think you could somehow use the Namespace Totals % which are given to separate out the repeated edits to actual articles vs. repeated edits to drama boards and user's talk pages. That would give a more accurate and relevant ratio for your purposes (I'd have to think for a few minutes how to do it which I might)

Ottava

QUOTE(communicat @ Sun 30th October 2011, 12:19pm)

Peter/Edward, don't know if you've come across this, written by a fellow logician. It might or might not answer some of your questions, and it provides some useful references.
http://knol.google.com/k/carl-hewitt/corru...ip_by_Wikipedia

Hewitt doesn't understand that the Stewards haven't respected the Foundation since 2008 and that very few people ever actually listened to Jimbo to begin with. (Not to say that the few who did weren't powerful, but the Commons matter shows that Jimbo was only given some power when he was towing the party line).

Peter

QUOTE

Ottava is not given to irony, but I assumed he was here.

I was being 100% honest. History merges were an annoying thing that was utterly important to me many times. It is annoying to have to go find an admin, link the different pages, and hope they get it right instead of being able to do a bunch of history merges in a row. Remember, I was building a dozen or so articles on average in my user space and then moving them out where many of them had articles. The merging of histories was a valuable addition.

Also, seeing deleted pages is important if you are trying to see what was there before when trying to recreate something. Editing protected pages is good for when you are working on different pages, want to update DYK queue, etc. Importing is another feature that I used. Suppressing redirect was handy quite regularly to me. And it is annoying if you want to move out a page to something that is a redirect already.

radek

QUOTE(Peter Damian @ Sun 30th October 2011, 12:23pm)

QUOTE(radek @ Sun 30th October 2011, 4:31pm)

Wikipedia is not a market.

That's interesting because analysis of Wales' early posts to the lists in 2001 suggests a market economy was exactly what he had in mind. That's why he was so heavy on not biasing the outcome by having content committees or editors in chief and so on.

Well, he can make whatever crappy analogies he wants to, but it still ain't. I think this just shows that Jimmy doesn't have much of a clue of what a market is or how it functions.

Peter Damian

QUOTE(radek @ Sun 30th October 2011, 5:25pm)

Oh yeah Peter, one thing. Your methodology will overestimate "content creation" by admins for ones who hang out mostly at AN/I and AE. More precisely, their high edits per page will come from them posting frequently to these drama boards, rather than working on articles.

There's probably some bias on the other end too. Someone like Piotrus has a edit per page number of 3.75, which is somewhere in the middle. But that's because the guy creates LOTS of pages and works a LOT on each of them. So there you'd have to control for total number of edits.

Actually, I think you could somehow use the Namespace Totals % which are given to separate out the repeated edits to actual articles vs. repeated edits to drama boards and user's talk pages. That would give a more accurate and relevant ratio for your purposes (I'd have to think for a few minutes how to do it which I might)

Quite correct. That is evident from the pie chart - certain editors have a high 'blue' proportion, which is the WP: prefixed pages. There is no way round that except by selective querying of the database to get only article contributions.

And there are many other ways this figure is skewed. E.g. YellowMonkey has the highest number of FAs, yet a (relatively) low average e.p.p. of 3.69. All I can hope to give is a blunt figure that shows some correlation with our intuitive idea of 'content', namely something that cannot be produced by flitting from page to page, and which requires a long look at a single article, concerning the summary, the meaning of the parts.

Yes, you could use the % of namespace totals as a proxy, but I can think of several reasons why that might be skewed.

At the end of the day, I am trying to give one of many reasons why the concept of 'crowdsourcing' is badly flawed.

QUOTE

But that's because the guy creates LOTS of pages and works a LOT on each of them. So there you'd have to control for total number of edits.

I don't agree with that. If I create 100 pages and give 100 edits to each page, that's a very high e.p.p. of 100. Piotrus is probably contaminating his content work with mechanical repetitive editing. Which I understand well, because I relieve the writer's block doldrums with such activity myself.

QUOTE(radek @ Sun 30th October 2011, 5:53pm)

Well, he can make whatever crappy analogies he wants to, but it still ain't. I think this just shows that Jimmy doesn't have much of a clue of what a market is or how it functions.

Well, he did publish a peer-reviewed paper on options pricing as part of his Ph.D., so he can't be a complete dunce. I think there are other explanations for why he said those things.

QUOTE(Ottava @ Sun 30th October 2011, 5:47pm)

QUOTE

Ottava is not given to irony, but I assumed he was here.

I was being 100% honest.

OK so I was right about the bit before the 'but'.

thekohser

QUOTE(radek @ Sun 30th October 2011, 12:31pm)

Wikipedia is not a market.

For most editors, no, it's not.

Me, though... I just received another $100 PayPal payment for some fairly simple work on Wikipedia.

communicat

PeterEdward, in my experience there's another category of wikipedian apart from admins with low-value skills and actual 'content contributors'. I'm referring of course to the category of "supervisor", namely the fact that for every content contributor there seem to be at least three or four extremely tedious and irritating "supervisors", not necessarily admins or productive editors, who constantly nit-pick and tell the content contributor how they think the edit should be done or what should or should not be included. Needless to say, these "supervisors" never, but never, make any edits or content contributions of their own. (Possibly because they already know from some past experience how much shit they'd have to put up with if ever they did try to make a useful contribution).

radek

QUOTE

Quite correct. That is evident from the pie chart - certain editors have a high 'blue' proportion, which is the WP: prefixed pages. There is no way round that except by selective querying of the database to get only article contributions.

Well, there's no perfect way of doing it but you could just subtract off the blue to get a probably better estimate.
So edits per article page would be (1-(%wikipedia+%wikipedia talk))*average edits per page

Ideally you'd want to adjust the number of "pages" as well by subtracting AN/I and AE or whatever, but since there aren't that many of these pages it won't get too skewed.

The only possible exception is FAR pages which also count as "wikipedia" (blue) even though a lot of that is obviously content related.

The real difficulty is adjusting for # of edits on users' talk, since there's no way to tell how many different user talk pages a particular person posted to. And a lot of these admins basically spend the majority of their time politickin' on each others' talk pages so that's really something which should be taken into account. For example, Fetchcommons has 28.26% of his posts to user's talk. Sandstein has 24.71%. SarekOfVulcan has 24.09%, Jechochman (who has a pretty high average edits per page - but that's not cause he edits articles a lot) 28.93%, Georgewilliamherbert 34.69% BWilkins 39.26% etc.
For comparison, only 6.62% of my edits are to users talk.

So none of the above have anything to do with average edits per ARTICLE page. Again, the difficulty is in adjusting both the numerator and denominator here.

Still I think the formula above would give a somewhat better picture of actual edits per article page.

QUOTE

And there are many other ways this figure is skewed. E.g. YellowMonkey has the highest number of FAs, yet a (relatively) low average e.p.p. of 3.69. All I can hope to give is a blunt figure that shows some correlation with our intuitive idea of 'content', namely something that cannot be produced by flitting from page to page, and which requires a long look at a single article, concerning the summary, the meaning of the parts.

Yes, and some people will work on articles on their word processor or sandbox and then just post the ready thing. Others (like me) like to do it bit by bit. So the measure is obviously going to miss that.

QUOTE

Yes, you could use the % of namespace totals as a proxy, but I can think of several reasons why that might be skewed.

At the end of the day, I am trying to give one of many reasons why the concept of 'crowdsourcing' is badly flawed.

Well, any statistic summarizes information, almost by definition. And when you summarize information, by definition, you're going to loose some information (the only alternative is to somehow look at every single edit ever made at Wikipedia simultaneously). That doesn't mean that describing data with statistics is useless.

Peter Damian

QUOTE(communicat @ Sun 30th October 2011, 6:29pm)

PeterEdward, in my experience there's another category of wikipedian apart from admins with low-value skills and actual 'content contributors'. I'm referring of course to the category of "supervisor", namely the fact that for every content contributor there seem to be at least three or four extremely tedious and irritating "supervisors", not necessarily admins or productive editors, who constantly nit-pick and tell the content contributor how they think the edit should be done or what should or should not be included. Needless to say, these "supervisors" never, but never, make any edits or content contributions of their own. (Possibly because they already know from some past experience how much shit they'd have to put up with if ever they did try to make a useful contribution).

Isn't this similar to the way the Red Army used to have 'political officers'?

Silver seren

How would you account for the people that work on making articles in their user subspace and then submit then whole to the mainspace in a single edit? They may end up being the ones with the lowest number of edits to an article, but actually contributed almost all of the content.

Peter Damian

QUOTE(Silver seren @ Sun 30th October 2011, 9:02pm)

How would you account for the people that work on making articles in their user subspace and then submit then whole to the mainspace in a single edit? They may end up being the ones with the lowest number of edits to an article, but actually contributed almost all of the content.

Yes of course there are a 101 ways in which this number could fail to have the meaning it may have. But then Giano tends to edit in his own space in the way you describe, yet he has one of the highest epp's.

All we can say, and all we need to say is that:

1. In general, editors with low epp's tend to perform relatively mechanical low economic value easily learned tasks. We can verify this by looking at their actual contributions. Editors with high epp's tend to be those with lots of FA and GA stars on their page, and who are generally and anecdotally known as so-called content contributors. That proves there is a division of labour in Wikipedia.

2. Low epp's predominate in the admin corps. Hardly surprising, given that the qualities required of an admin are precisely low-value, repetitive tasks, and given that RfA tends to emphasise quantity rather than quality of edits.

3. The theory of crowdsourcing says that this shouldn't happen.

Ottava

QUOTE(radek @ Sun 30th October 2011, 4:28pm)

Well, there's no perfect way of doing it but you could just subtract off the blue to get a probably better estimate.
So edits per article page would be (1-(%wikipedia+%wikipedia talk))*average edits per page

Ideally you'd want to adjust the number of "pages" as well by subtracting AN/I and AE or whatever, but since there aren't that many of these pages it won't get too skewed.

The only possible exception is FAR pages which also count as "wikipedia" (blue) even though a lot of that is obviously content related.

The real difficulty is adjusting for # of edits on users' talk, since there's no way to tell how many different user talk pages a particular person posted to. And a lot of these admins basically spend the majority of their time politickin' on each others' talk pages so that's really something which should be taken into account. For example, Fetchcommons has 28.26% of his posts to user's talk. Sandstein has 24.71%. SarekOfVulcan has 24.09%, Jechochman (who has a pretty high average edits per page - but that's not cause he edits articles a lot) 28.93%, Georgewilliamherbert 34.69% BWilkins 39.26% etc.
For comparison, only 6.62% of my edits are to users talk.

So none of the above have anything to do with average edits per ARTICLE page. Again, the difficulty is in adjusting both the numerator and denominator here.

Still I think the formula above would give a somewhat better picture of actual edits per article page.

Just for curiosity's sake: my user talk page percentage was 30.02%. My article percentage was 25.53%. I made an average of 8.61 average edits per page.

I also participated in over 300 different FAC reviews ("Wikipedia" page) and many DYK related matters (also "Wikipedia" page related).

I have a feeling that you might want to break down where exactly the people are editing. Perhaps a much better way of determining "content" contributors are those who add large amounts of bytes to an article that aren't part of an undo? (It would be hard to remove all of the undos though).

radek

QUOTE(Peter Damian @ Sun 30th October 2011, 4:12pm)

QUOTE(Silver seren @ Sun 30th October 2011, 9:02pm)

How would you account for the people that work on making articles in their user subspace and then submit then whole to the mainspace in a single edit? They may end up being the ones with the lowest number of edits to an article, but actually contributed almost all of the content.

Yes of course there are a 101 ways in which this number could fail to have the meaning it may have. But then Giano tends to edit in his own space in the way you describe, yet he has one of the highest epp's.

All we can say, and all we need to say is that:

1. In general, editors with low epp's tend to perform relatively mechanical low economic value easily learned tasks. We can verify this by looking at their actual contributions. Editors with high epp's tend to be those with lots of FA and GA stars on their page, and who are generally and anecdotally known as so-called content contributors. That proves there is a division of labour in Wikipedia.

2. Low epp's predominate in the admin corps. Hardly surprising, given that the qualities required of an admin are precisely low-value, repetitive tasks, and given that RfA tends to emphasise quantity rather than quality of edits.

3. The theory of crowdsourcing says that this shouldn't happen.

As I mention above, after seeing Jechoman's epp (7.46) I disagree with the third sentence of 1, though I'm not sure how indicative this is on average. Basically you DO have to control somehow for % of edits to actual articles vs. other categories of Wikipedia pages.

If there was data you could do some regressions here:

1. Dependent variable is a 0/1 dummy for whether a person is an admin or a non-admin. Independent variables are epp, % edits to articles space etc. Run this as a Probit or Logit.

2. Construct a measure of whether a person is a "content creator" by, say, counting up their GAs, FAs and maybe DYKs and just non-redirect articles, weighting these in some way (which would be arbitrary but you could change the weighting to do robustness checks). Then correlate that with epp and % edits to article space.

Overall I don't think the idea that there's "division of labor" on Wikipedia is controversial though. And some of that may even be justified. The problem is with the differential awards and over (under) supply of one particular type relative to the other.

Edit: or as another counter example take Baseball Bugs. His epp is 10.63. But we all know that's only because he just edits AN/I more or less. Yet a simple measure such as yours would put him in a category of "content creator"

(As a further aside, in that Dr. Blofeld discussion that was linked, some moron objects to people objecting to Dr. Blofeld's mass creation of one sentence stubs because "we shouldn't interfere with the work of content creators". In other words, lots of these idiots actually think that auto-creating thousdands of one sentence next to useless stubs is "content creation"!)

Peter Damian

QUOTE(Ottava @ Sun 30th October 2011, 9:18pm)

I have a feeling that you might want to break down where exactly the people are editing. Perhaps a much better way of determining "content" contributors are those who add large amounts of bytes to an article that aren't part of an undo? (It would be hard to remove all of the undos though).

It all depends what you want to prove. I am trying simply to see if there is a simple statistical measure that suggests, with some degree of confidence, that there is a division of labour between 'content contributors' and 'gnomes'. We all know anecdotally that this exists, but here is an objective measure. The fact that the measure, like all statistical measures, only shows this with a certain degree of confidence, but no absolute certainty, does not matter. To be sure, some people we think of as content contributors have gnome-like characteristics (e.g. YellowMonkey). But we know that too, from his edits, when he was active.

The other point is that low epp count is indicative of low-value added - monotonous, easily learned repetitive labour.

The final point is that this low-value labour gives you high status on Wikipedia, unlike in the real world.

Peter Damian

QUOTE(radek @ Sun 30th October 2011, 9:24pm)

As I mention above, after seeing Jechoman's epp (7.46) I disagree with the third sentence of 1, though I'm not sure how indicative this is on average. Basically you DO have to control somehow for % of edits to actual articles vs. other categories of Wikipedia pages.

I looked at his edits and he has a large percentage of 'blue' (Wikipedia: pages) which suggests he is part of the peanut gallery. I'm not disagreeing - it's an 'in general' thing. I looked at 720 admin editors and tried in each case of > 4 to explain why it was higher. In nearly all cases the person has a hobby of caterpillars or asteroids, or has FA and GA stars. In most cases of <4, this is not the case. In nearly every case of < 2 the person either is a bot, or acts like one.

Interesting that David Gerard got the second lowest score, I should have mentioned that earlier :|

QUOTE

Edit: or as another counter example take Baseball Bugs. His epp is 10.63. But we all know that's only because he just edits AN/I more or less. Yet a simple measure such as yours would put him in a category of "content creator"

Agree again. With all statistical measures, we see if there is broad agreement, look for anomalies, then try and explain them.

I will do this study again some time, but using the tool to check 720 edits take exactly 2 days. Access to the database would be wonderful.

radek

QUOTE(Peter Damian @ Sun 30th October 2011, 4:32pm)

QUOTE(radek @ Sun 30th October 2011, 9:24pm)

As I mention above, after seeing Jechoman's epp (7.46) I disagree with the third sentence of 1, though I'm not sure how indicative this is on average. Basically you DO have to control somehow for % of edits to actual articles vs. other categories of Wikipedia pages.

I looked at his edits and he has a large percentage of 'blue' (Wikipedia: pages) which suggests he is part of the peanut gallery. I'm not disagreeing - it's an 'in general' thing. I looked at 720 admin editors and tried in each case of > 4 to explain why it was higher. In nearly all cases the person has a hobby of caterpillars or asteroids, or has FA and GA stars. In most cases of <4, this is not the case. In nearly every case of < 2 the person either is a bot, or acts like one.

Interesting that David Gerard got the second lowest score, I should have mentioned that earlier :|

QUOTE

Edit: or as another counter example take Baseball Bugs. His epp is 10.63. But we all know that's only because he just edits AN/I more or less. Yet a simple measure such as yours would put him in a category of "content creator"

Agree again. With all statistical measures, we see if there is broad agreement, look for anomalies, then try and explain them.

I will do this study again some time, but using the tool to check 720 edits take exactly 2 days. Access to the database would be wonderful.

I think you have uncovered a certain asymmetric pattern here: low epp --> "gnomish edits" or "useless crap" but certainly not "content". Hi epp --> it depends.

I brought up the counter examples above simply because I'm wondering how much of the pattern that is and if it could somehow be controlled for. High % "blue pages" and % "user's talk" I think would be good indicators that a particular editor with a high epp is in the "peanut gallery" category, not the "content creator" category

Peter Damian

For the record, here are the top 20 scorers. Most of them are consistent with the hypothesis. Even FT2, whose top articles were Zoophilia, Labrador Retriever and Neurolinguistic Programming.

Marine 69-71 is Tony the Marine.

Zero0000 7.28
Maunus 7.36
Jmh649 7.38
Jehochman 7.46
Happyme22 7.77
AnemoneProjectors 7.87
Cailil 7.97
Masem 8.05
Mike Cline 8.23
Stephan Schulz 8.28
Cbl62 9.05
Gatoclass 9.2
FT2 9.27
Gwen Gale 9.27
SlimVirgin 9.4
Slrubenstein 9.66
COGDEN 9.67
Marine 69-71 10.08
Moni3 12.72
Wehwalt 20.51

Peter Damian

QUOTE(radek @ Sun 30th October 2011, 9:37pm)

I think you have uncovered a certain asymmetric pattern here: low epp --> "gnomish edits" or "useless crap" but certainly not "content". Hi epp --> it depends.

It does depend, but if you look at the actual top 20, with very few exceptions, they don't edit 'blue pages'. Hochman is the only one, I think.

Could it or should it be controlled? Only if it occurs significantly across much of the sample. Here, I think we can note it and pass one.

The anomalies are actually in the 2-3 region where content contributors also engage in regular frenetic 'gnoming'.

On Baseball Bugs, I did another study a few months ago of edits over 2 years to ANI. He came out way ahead of anyone else and is, again, probably an anomaly.

communicat

QUOTE(Peter Damian @ Sun 30th October 2011, 11:01pm)

QUOTE(communicat @ Sun 30th October 2011, 6:29pm)

PeterEdward, in my experience there's another category of wikipedian apart from admins with low-value skills and actual 'content contributors'. I'm referring of course to the category of "supervisor", namely the fact that for every content contributor there seem to be at least three or four extremely tedious and irritating "supervisors", not necessarily admins or productive editors, who constantly nit-pick and tell the content contributor how they think the edit should be done or what should or should not be included. Needless to say, these "supervisors" never, but never, make any edits or content contributions of their own. (Possibly because they already know from some past experience how much shit they'd have to put up with if ever they did try to make a useful contribution).

Isn't this similar to the way the Red Army used to have 'political officers'?

I see no convincing comparison or correlation between the Red Army's political commissars and WP's self-appointed supervisors. But if it's correlations you're after, try the one that exists between the decline of the American-dominated WP and the decline of the American economy -- (not to mention the decline in American international prestige following its disasterous interventions in Iraq and Afghanistan, and the disaster that's sure to follow in newly "liberated" Libya). Very few people outside of America attach much credibility these days to anything perceived to be American or American-based, (including even or especially WP).

Peter Damian

QUOTE(communicat @ Sun 30th October 2011, 9:47pm)

I see no convincing comparison or correlation between the Red Army's political commissars and WP's self-appointed supervisors.

That reminds me of another study I need to complete. I was looking at regularly blocked content creators such as Malleus, Giano and, er, myself.

There was a regular pattern of one admin blocks for some supposed offence, and then another admin unblocks. You can easily put this into a table with 'block' on one side and 'unblock' on another.

Now, if we were to plot the binary block/unblock against epp, what would we get, I wonder?

Suggestions or guesses please.

radek

QUOTE(Peter Damian @ Sun 30th October 2011, 4:45pm)

QUOTE(radek @ Sun 30th October 2011, 9:37pm)

I think you have uncovered a certain asymmetric pattern here: low epp --> "gnomish edits" or "useless crap" but certainly not "content". Hi epp --> it depends.

It does depend, but if you look at the actual top 20, with very few exceptions, they don't edit 'blue pages'. Hochman is the only one, I think.

Could it or should it be controlled? Only if it occurs significantly across much of the sample. Here, I think we can note it and pass one.

The anomalies are actually in the 2-3 region where content contributors also engage in regular frenetic 'gnoming'.

On Baseball Bugs, I did another study a few months ago of edits over 2 years to ANI. He came out way ahead of anyone else and is, again, probably an anomaly.

Well, the other one that you should include is "purple" pages (User talk). But yes, there is some patterns here.

Here, I made a matrix (and uploaded it to commons

) which I think sort of describes what is going on, though obviously we haven't got the data to confirm ALL the cells in it:

You could graph some "famous" editors on that matrix like in those libertarian "economics/social values" graphs people put on their userpages. I expect that'd be pretty funny AND informative.

(and on that note, I'm sort of wondering if there's a way to randomly sample editors (say, those with more than 1000 edits) in a way similar to the Random Article feature)

Peter Damian

QUOTE(radek @ Sun 30th October 2011, 10:00pm)

QUOTE(Peter Damian @ Sun 30th October 2011, 4:45pm)

QUOTE(radek @ Sun 30th October 2011, 9:37pm)

I think you have uncovered a certain asymmetric pattern here: low epp --> "gnomish edits" or "useless crap" but certainly not "content". Hi epp --> it depends.

It does depend, but if you look at the actual top 20, with very few exceptions, they don't edit 'blue pages'. Hochman is the only one, I think.

Could it or should it be controlled? Only if it occurs significantly across much of the sample. Here, I think we can note it and pass one.

The anomalies are actually in the 2-3 region where content contributors also engage in regular frenetic 'gnoming'.

On Baseball Bugs, I did another study a few months ago of edits over 2 years to ANI. He came out way ahead of anyone else and is, again, probably an anomaly.

Well, the other one that you should include is "purple" pages (User talk). But yes, there is some patterns here.

Here, I made a matrix (and uploaded it to commons

) which I think sort of describes what is going on, though obviously we haven't got the data to confirm ALL the cells in it:

You could graph some "famous" editors on that matrix like in those libertarian "economics/social values" graphs people put on their userpages. I expect that'd be pretty funny AND informative.

I think that hits the nail on the head.

However, under 'content creators' there is a further subdivision into those who create long and boring articles sourced from weather and hurricane reports, or articles about video games, and those who don't.

There's very little remainder, actually.

radek

QUOTE(Peter Damian @ Sun 30th October 2011, 5:05pm)

QUOTE(radek @ Sun 30th October 2011, 10:00pm)

QUOTE(Peter Damian @ Sun 30th October 2011, 4:45pm)

QUOTE(radek @ Sun 30th October 2011, 9:37pm)

I think you have uncovered a certain asymmetric pattern here: low epp --> "gnomish edits" or "useless crap" but certainly not "content". Hi epp --> it depends.

It does depend, but if you look at the actual top 20, with very few exceptions, they don't edit 'blue pages'. Hochman is the only one, I think.

Could it or should it be controlled? Only if it occurs significantly across much of the sample. Here, I think we can note it and pass one.

The anomalies are actually in the 2-3 region where content contributors also engage in regular frenetic 'gnoming'.

On Baseball Bugs, I did another study a few months ago of edits over 2 years to ANI. He came out way ahead of anyone else and is, again, probably an anomaly.

Well, the other one that you should include is "purple" pages (User talk). But yes, there is some patterns here.

Here, I made a matrix (and uploaded it to commons

) which I think sort of describes what is going on, though obviously we haven't got the data to confirm ALL the cells in it:

You could graph some "famous" editors on that matrix like in those libertarian "economics/social values" graphs people put on their userpages. I expect that'd be pretty funny AND informative.

I think that hits the nail on the head.

However, under 'content creators' there is a further subdivision into those who create long and boring articles sourced from weather and hurricane reports, or articles about video games, and those who don't.

There's very little remainder, actually.

Looking at some editors epps and %s, an allowance should be made for people who run certain projects. For example both Gatoclass and SandyGeorgia would show up in the "Drama Queens" category. They have high epps because they post a lot to the same project page (DYKs and GAs) and low % article space counts for the very same reason (or because they post to user talk to notify people that their articles are being reviewed/approved etc.)

Ottava

QUOTE(radek @ Sun 30th October 2011, 6:00pm)

Here, I made a matrix (and uploaded it to commons

) which I think sort of describes what is going on, though obviously we haven't got the data to confirm ALL the cells in it:

You could graph some "famous" editors on that matrix like in those libertarian "economics/social values" graphs people put on their userpages. I expect that'd be pretty funny AND informative.

(and on that note, I'm sort of wondering if there's a way to randomly sample editors (say, those with more than 1000 edits) in a way similar to the Random Article feature)

My percentage in Articles was less than 30%. I still think you are forgetting WP:DYK, WP:GAN, WP:FAC, which moves edits from "article" or "article talk" to Wikipedia. Nevermind, you mentioned that in your next post.

By the way, Gatoclass writes very little actual content. He is just an admin that latched onto DYK and used it as his little territory. SandyGeorgia does some article work but very little anymore.

radek

QUOTE(Ottava @ Sun 30th October 2011, 5:52pm)

QUOTE(radek @ Sun 30th October 2011, 6:00pm)

Here, I made a matrix (and uploaded it to commons

) which I think sort of describes what is going on, though obviously we haven't got the data to confirm ALL the cells in it:

You could graph some "famous" editors on that matrix like in those libertarian "economics/social values" graphs people put on their userpages. I expect that'd be pretty funny AND informative.

(and on that note, I'm sort of wondering if there's a way to randomly sample editors (say, those with more than 1000 edits) in a way similar to the Random Article feature)

My percentage in Articles was less than 30%. I still think you are forgetting WP:DYK, WP:GAN, WP:FAC, which moves edits from "article" or "article talk" to Wikipedia. Nevermind, you mentioned that in your next post.

By the way, Gatoclass writes very little actual content. He is just an admin that latched onto DYK and used it as his little territory. SandyGeorgia does some article work but very little anymore.

You are, for once, right on this. I'm actually taking down some of this data for various people and you come up as ~~a "Someone who uses Wikipedia as Facebook"~~ but I don't think you were that - well, not that much - correction, you come up as "Drama Queen"... hmm, maybe not that far off. This is actually very similar to the problem that someone like SandyGeorgia comes up as indistinguishable along these two dimensions from someone like Baseball Bugs. And all of that has to do with the fact that the soxred data does not distinguish between "Posting to AN/I way too much" from "Reviewing GAs and FAs" - it counts both under "Wikipedia" but qualitatively these are very different things.

So... I'm still tweaking it. If anyone can point me to a statistic which would allow me to distinguish "Posting to ANI way too much" from "Reviewing GAs" (or similar) kind of people then I would appreciate it. For some editors who "opted in" into the whole soxred thing you can do it, but most haven't. Other than that, the only thing I can think of is to take an editor's last 1000 or so contributions and see what % were to ANI, AE etc. But that's a buttload of work at this point.

BTW, Malleus is a very clear outlier. Very high % in article space and pretty high % epp. Very clearly a "content contributor". Giano not so much (though still in that cell).

Update:

Here's a bit of what I have so far:

Again, the basic problem is that given the data, in the "warm colors" category (red and orange) it is impossible to distinguish people who use WP:whatever type pages (the blue pages) for what could essentially be considered legitimate uses (reviewing FAs etc.) vs. people who are fucking around (playing on ANI, politicking on talk pages)

Also, related to the other thread, someone like Dr. Blofeld shows up as a "wiki gnome" because they mass create a lot of one or two sentence stubs. This means their article % is high, but since he never goes back to see what happened to the children he sired he has a low epp. In this case I think "wiki gnome" is not too inaccurate (cough cough), so I'm not bothered by this. Overall I think this illustrates some of the above discussion.

EricBarbour

QUOTE(radek @ Sun 30th October 2011, 1:28pm)

For example, Fetchcommons has 28.26% of his posts to user's talk. Sandstein has 24.71%. SarekOfVulcan has 24.09%, Jechochman (who has a pretty high average edits per page - but that's not cause he edits articles a lot) 28.93%, Georgewilliamherbert 34.69% BWilkins 39.26% etc.

All of whom are notoriously contentious and abusive admins.
And none of whom adds very much in article content.

This chart is actually not bad, although there are some exceptions (but not very many).
Bear in mind that many of those "wiki gnomes" are heavy users of bots that scrape
from other websites. I would call them something more descriptive, like "Benders".

(That's because Futurama is an extremely popular subject among WP admins....)

radek

QUOTE(EricBarbour @ Sun 30th October 2011, 6:29pm)

Bear in mind that many of those "wiki gnomes" are heavy users of bots that scrape
from other websites. I would call them something more descriptive, like "Benders".

Yes, very much so, as the chart right above illustrates. Here's where you get into semantics - mass creating a bunch of next-to-useless stubs is "gnomish" so the category is appropriate, just not refined enough. I do wish that I could somehow just a get a huge data dump on all active (more than 100 edits per month) editors, to see what the relative supply of each kind is.

EricBarbour

QUOTE(Peter Damian @ Sun 30th October 2011, 3:05pm)

However, under 'content creators' there is a further subdivision into those who create long and boring articles sourced from weather and hurricane reports, or articles about video games, and those who don't.

Yes, there is a reasonably clear division between obsessives and people who write on varied
article subjects. The real obsessives have truly disturbing contribs. Like the several guys who
can't stop talking about hurricanes, or the Doctor Who nerds.

This is why Jimbo's old comment that he "didn't want bias" from obsessive editors is so pathetic.
Because that's exactly what he's got. In spades.

timbo

Radek's Chart really nails it.

Silver seren makes a good point that some excellent content creators probably write outside of mainspace and then transfer everything at once. So there would be substantial content creation with minimal edits per page in this situation. I suppose a really scientific study would somehow include kilobytes of content incorporated into the first edit of a newly created page and weigh that into the equation.

Myself, I like to build a framework, write a lead, add a couple lines to subsections and a source or two to keep the wolves at bay, and then to do the writing in mainspace.

As to the question why gnomish administrative sorts have high status and content creators low status, that's an ongoing sore spot with me. I think that part of it is to be corrected by simple consciousness raising among those who write. Speaking for myself, I felt somehow vindicated or rewarded or whatever the term is when I was given autoreviewed status -- so that new articles come through the front gate without being highlighted in yellow and therefore tampered with by gnomish administrative sorts for no good reason.

The New Articles spooler is akin to a shark tank sometimes, some of those reviewing new work are only semi-competent, working too fast, meddling too much. Obviously, there's a lot of swill rolling through the door that needs to be stopped, but it's still a source of annoyance to just get started and then have a series of edit conflicts with meddlesome new page patrollers.

My prescription for WP would be to have autoreviewed status made into a bigger deal as a mechanism for rewarding content creators.

I also wouldn't mind the gnomish sorts being taken down a peg by renaming "administrators" as "janitors." That would balance the field. But that's pettiness on my part, I suppose, owing to an aversion to people of that personality type and their clique mentality...

I think there are some administrative tools that would be useful for content creators. Being able to see deleted files would be a boon now and then -- but that ultimately is a pretty minor tidbit in the big scheme of things; certainly nothing worth undergoing the Lord of the Flies gauntlet of THOSE people.

tim

radek

QUOTE(timbo @ Sun 30th October 2011, 8:57pm)

That chart really nails it.

Silver seren makes a good point that some excellent content creators probably write outside of mainspace and then transfer everything at once. So there would be substantial content creation with minimal edits per page in this situation. I suppose a really scientific study would somehow include kilobytes of content incorporated into the first edit of a newly created page and weigh that into the equation.

Myself, I like to build a framework, write a lead, add a couple lines to subsections and a source or two to keep the wolves at bay, and then to do the writing in mainspace.

As to the question why gnomish administrative sorts have high status and content creators low status, that's an ongoing sore spot with me. I think that part of it is to be corrected by simple consciousness raising among those who write. Speaking for myself, I felt somehow vindicated or rewarded or whatever the term is when I was given autoreviewed status -- so that new articles come through the front gate without being highlighted in yellow and therefore tampered with by gnomish administrative sorts for no good reason.

The New Articles spooler is akin to a shark tank sometimes, some of those reviewing new work are only semi-competent, working too fast, meddling too much. Obviously, there's a lot of swill rolling through the door that needs to be stopped, but it's still a source of annoyance to just get started and then have a series of edit conflicts with meddlesome new page patrollers.

My prescription for WP would be to have autoreviewed status made into a bigger deal as a mechanism for rewarding content creators.

I also wouldn't mind the gnomish sorts being taken down a peg by renaming "administrators" as "janitors." That would balance the field. But that's pettiness on my part, I suppose, owing to an aversion to people of that personality type and their clique mentality...

I think there are some administrative tools that would be useful for content creators. Being able to see deleted files would be a boon now and then -- but that ultimately is a pretty minor tidbit in the big scheme of things; certainly nothing worth undergoing the Lord of the Flies gauntlet of THOSE people.

tim

In case you're wondering you're in the "Wiki Gnome" category. Which, perhaps, just goes to show, that the above chart is about the TYPE of contributions and not really about the QUALITY of such - in my mind it's useful to get the TYPE distribution down first. I mean, some AN/I commentatin' might actually be "of quality" or something. But I don't really know you so it could be quality.

Second, the "autoreviewer" thing is a joke. Anyone who has managed to make a few edits without getting blocked as a vandal can get it. They throw these bones to you to make you think you're "important". You're not. (Neither am I)

timbo

QUOTE(EricBarbour @ Sun 30th October 2011, 4:29pm)

QUOTE(radek @ Sun 30th October 2011, 1:28pm)

For example, Fetchcommons has 28.26% of his posts to user's talk. Sandstein has 24.71%. SarekOfVulcan has 24.09%, Jechochman (who has a pretty high average edits per page - but that's not cause he edits articles a lot) 28.93%, Georgewilliamherbert 34.69% BWilkins 39.26% etc.

All of whom are notoriously contentious and abusive admins. ***

I noticed Sarek's name on the Resigned Administrators list today. I've bumped bellies with him in the past once or twice... I don't think he's a bad person, just not temperamentally suited for power tools, in my estimation. I think maybe he has come to the same conclusion.

Here's hoping he has second wind as a content creator...

tim

timbo

QUOTE(radek @ Sun 30th October 2011, 7:11pm)

Second, the "autoreviewer" thing is a joke. Anyone who has managed to make a few edits without getting blocked as a vandal can get it. They throw these bones to you to make you think you're "important". You're not. (Neither am I)

It made my life easier and less stressful and I value it.

Everybody is important and nobody is important. Getting some acknowledgement that one's work is noticed by others is good. Give a cowardly lion a medal and it makes him courageous.

tim

Peter Damian

A message to me from a Wikipedian.

QUOTE

I wish you all the best, and look forward to the book, but hope you realise you have a blind spot, and that you do not bend the statistic you gather to reinforce what you already beieve. I think you are a good guy, have always though that, but you've been through the wars here, dont let it affect your critical distance as it will inevitably be used against you. Feel free to talk to me anytime, I have the same facination. Youd be surprised how many of us think similar to you, but more critically and with more self awareness. Still, best. Ceoil (talk) 01:38, 31 October 2011 (UTC)

OK I need to develop this â€˜critical awarenessâ€™ about Wikipedia. Can anyone help me here?

dogbiscuit

QUOTE(Peter Damian @ Mon 31st October 2011, 9:50am)

A message to me from a Wikipedian.

QUOTE

I wish you all the best, and look forward to the book, but hope you realise you have a blind spot, and that you do not bend the statistic you gather to reinforce what you already beieve. I think you are a good guy, have always though that, but you've been through the wars here, dont let it affect your critical distance as it will inevitably be used against you. Feel free to talk to me anytime, I have the same facination. Youd be surprised how many of us think similar to you, but more critically and with more self awareness. Still, best. Ceoil (talk) 01:38, 31 October 2011 (UTC)

OK I need to develop this â€˜critical awarenessâ€™ about Wikipedia. Can anyone help me here?

He is simply saying that you are biased by your experiences (or at minimum are seen as biased by your experiences) and you need to see that.

That you have a number of hypotheses about Wikipedia that can be construed as anti-project is probably fair comment; whether your experiences have made you uncritical and you cannot see that in yourself I wouldn't care to judge.

communicat

QUOTE(Peter Damian @ Mon 31st October 2011, 10:50am)

A message to me from a Wikipedian.

QUOTE

I wish you all the best, and look forward to the book, but hope you realise you have a blind spot, and that you do not bend the statistic you gather to reinforce what you already beieve. I think you are a good guy, have always though that, but you've been through the wars here, dont let it affect your critical distance as it will inevitably be used against you. Feel free to talk to me anytime, I have the same facination. Youd be surprised how many of us think similar to you, but more critically and with more self awareness. Still, best. Ceoil (talk) 01:38, 31 October 2011 (UTC)

OK I need to develop this â€˜critical awarenessâ€™ about Wikipedia. Can anyone help me here?

Maybe try critical awareness of Mark Twain's famous quote: "Lies, damned lies, and statistics". ?

thekohser

Try this, Peter. Say five nice things about Wikipedia, and say them like you mean them. We can then see if you have the ability to objectively evaluate that cess pit.

communicat

QUOTE(Peter Damian @ Mon 31st October 2011, 10:50am)

A message to me from a Wikipedian.

QUOTE

I wish you all the best, and look forward to the book, but hope you realise you have a blind spot, and that you do not bend the statistic you gather to reinforce what you already beieve. I think you are a good guy, have always though that, but you've been through the wars here, dont let it affect your critical distance as it will inevitably be used against you. Feel free to talk to me anytime, I have the same facination. Youd be surprised how many of us think similar to you, but more critically and with more self awareness. Still, best. Ceoil (talk) 01:38, 31 October 2011 (UTC)

OK I need to develop this â€˜critical awarenessâ€™ about Wikipedia. Can anyone help me here?

By "critical awareness", he/she may be referring to an analysis that draws from knowledge across the social sciences and humanities -- not one that appears presently to be relying exclusively a quantitative analytical approach, (both in this topic as in its current fork marked "Content contributors").

I agree with Ceoil that you (and others in the discussions) may be striving to support with statistical evidence a hypothesis that you have already, prematurely formed; and thus provide your pre-selected hypothesis with a veneer of empirical respectability. As the discussions show, there are just too many variables involved for any convincing objective, quantitative "proof" to emerge. Forget about the maths and the empiricism and the "logic"; try a qualitative approach, which allows for a measure of subjectivity.

Malleus

QUOTE(communicat @ Mon 31st October 2011, 5:36pm)

QUOTE(Peter Damian @ Mon 31st October 2011, 10:50am)

A message to me from a Wikipedian.

QUOTE

I wish you all the best, and look forward to the book, but hope you realise you have a blind spot, and that you do not bend the statistic you gather to reinforce what you already beieve. I think you are a good guy, have always though that, but you've been through the wars here, dont let it affect your critical distance as it will inevitably be used against you. Feel free to talk to me anytime, I have the same facination. Youd be surprised how many of us think similar to you, but more critically and with more self awareness. Still, best. Ceoil (talk) 01:38, 31 October 2011 (UTC)

OK I need to develop this â€˜critical awarenessâ€™ about Wikipedia. Can anyone help me here?

By "critical awareness", he/she may be referring to an analysis that draws from knowledge across the social sciences and humanities -- not one that appears presently to be relying exclusively a quantitative analytical approach, (both in this topic as in its current fork marked "Content contributors").

I agree with Ceoil that you (and others in the discussions) may be striving to support with statistical evidence a hypothesis that you have already, prematurely formed; and thus provide your pre-selected hypothesis with a veneer of empirical respectability. As the discussions show, there are just too many variables involved for any convincing objective, quantitative "proof" to emerge. Forget about the maths and the empiricism and the "logic"; try a qualitative approach, which allows for a measure of subjectivity.

I think that demonstrates a fundamental misunderstanding of the scientific method, perhaps one that Peter shares. The point of a hypothesis is to state it in such a way that it is susceptible to empirical investigation designed to disprove it, not to prove it. And to suggest that a qualitative approach may be more objective than a quantitative one is just risible.

Peter Damian

QUOTE(communicat @ Mon 31st October 2011, 5:36pm)

I agree with Ceoil that you (and others in the discussions) may be striving to support with statistical evidence a hypothesis that you have already, prematurely formed; and thus provide your pre-selected hypothesis with a veneer of empirical respectability. As the discussions show, there are just too many variables involved for any convincing objective, quantitative "proof" to emerge. Forget about the maths and the empiricism and the "logic"; try a qualitative approach, which allows for a measure of subjectivity.

I suspect you are an idiot. Can you please read carefully the original post http://ocham.blogspot.com/2011/10/repetiti...-wikipedia.html and please tell me whether I was advancing or proving any hypothesis, and if so, what hypothesis you think I was advancing or trying to prove?

Please note the bit in the post that says "before you leap to conclusions".

[edit] Think also of this limiting case. I write an entire new article off-wiki, and then save it onto Wikipedia, links and all, and I never return to that article. I then write another article off-wiki and save that into Wikipedia. Repeat another 98 times. Thus I have written 100 complete articles, of the sort that would normally require 1,000â€™s of edits. Yet my average epp = 1, exactly. I was suggesting that we shouldnâ€™t leap to the natural conclusion that low epp = low value, or not â€˜content creatorâ€™ or anything like that.

thekohser

QUOTE(Peter Damian @ Mon 31st October 2011, 2:12pm)

I suspect you are an idiot.

None of my experiments involving Communicat have been able to disprove that hypothesis, Peter.

radek

QUOTE(Peter Damian @ Mon 31st October 2011, 1:12pm)

QUOTE(communicat @ Mon 31st October 2011, 5:36pm)

I agree with Ceoil that you (and others in the discussions) may be striving to support with statistical evidence a hypothesis that you have already, prematurely formed; and thus provide your pre-selected hypothesis with a veneer of empirical respectability. As the discussions show, there are just too many variables involved for any convincing objective, quantitative "proof" to emerge. Forget about the maths and the empiricism and the "logic"; try a qualitative approach, which allows for a measure of subjectivity.

I suspect you are an idiot. Can you please read carefully the original post http://ocham.blogspot.com/2011/10/repetiti...-wikipedia.html and please tell me whether I was advancing or proving any hypothesis, and if so, what hypothesis you think I was advancing or trying to prove?

Please note the bit in the post that says "before you leap to conclusions".

[edit] Think also of this limiting case. I write an entire new article off-wiki, and then save it onto Wikipedia, links and all, and I never return to that article. I then write another article off-wiki and save that into Wikipedia. Repeat another 98 times. Thus I have written 100 complete articles, of the sort that would normally require 1,000â€™s of edits. Yet my average epp = 1, exactly. I was suggesting that we shouldnâ€™t leap to the natural conclusion that low epp = low value, or not â€˜content creatorâ€™ or anything like that.

That's right, I'm the one who went ahead and made that leap for you (with caveats and stuff)

Ottava

One of the things I noticed is that even if you narrow down who are content contributors, you still have a lot of problematic contributors.

Take this for example. The guy altered many cited statements and made them 100% opposite of what the source says. The guy then adds a lot of blatant original research contradicted by other parts of the page that are cited. This is a highly read page and though he was reverted twice with people pointing out that he was adding original research, he is still allowed to continue it and his additions are now the current version of the page.

These people are rampant.

EricBarbour

QUOTE(Ottava @ Mon 31st October 2011, 12:32pm)

url=http://en.wikipedia.org/w/index.php?title=Kubla_Khan&diff=458271697&oldid=454934213]Take this for example[/url]. The guy altered many cited statements and made them 100% opposite of what the source says. The guy then adds a lot of blatant original research contradicted by other parts of the page that are cited. This is a highly read page and though he was reverted twice with people pointing out that he was adding original research, he is still allowed to continue it and his additions are now the current version of the page.

These people are rampant.

Those are "subtle vandals". I think there might be a few hundred of them, usually sticking to
certain areas (like the guy who uses his IP address to falsify British football statistics).
Wikipedia cannot deal with them, it is too corrupt and incompetent. One can't even figure out
how much subtle vandalism is going on because their changes look legitimate. It might be
possible to write a complex script to check simple things like sports stats, but you'd need a
verifiable database to check against, and it would be a big job. The people who could and SHOULD
do this, the guys who write editing and vandalism bots, won't. Because they would have to work
very hard to produce a script that is reliable, and because they don't care. Diddling Wikipedia is
supposed to be "fun", not work.

At the end of the day, Wikipedia is not an "encyclopedia". It is a fundraising scam.

They have to produce statistics that show the volunteer userbase isn't declining, so they wave
around the increase of total articles and the raw edit stats.
Nothing about the QUALITY of those articles. Nothing about the QUALITY of the edits.
Figuring out "quality" would cost a lot of money and their remaining "dedicated" volunteer
community is full of total raving flakes and fools, who don't want to hear there is a "problem".
So no one makes important changes, more and more bots generate crap content, and the
whole thing slowly declines.

I meant what I said in the other thread: Wikipedia will go the way of dmoz.org.