I did some further work on the 'division of labour' within Wikipedia, following criticisms of the previous post. The two substantial criticisms were (1) that an epp average disguises indvidual page work by contributors who also edit many pages as part of, say RC patrol. Karanacs is a good example of this. And (2) that the epp score is for all spaces - user, user talk, talk, Wikipedia pages, and so on.
The second study is in progress, due to the large amounts of data, but I include some preliminary results below. I took 5,000 edits from a sample of editors, including editors with a very low epp. Then I classified each edit into A (article page and talk), U (user page and talk), W (Wikipedia page and talk) and, for completeness, T for template page and talk. I then computed the distribution of edits per page, for each of those spaces. I summarise the results in the table below. The first column is editor name. The next four are the maximum number of edits to a single page for each of the four spaces. Then four columns for the total number of pages in each space. Then a total for those four columns as a checksum. This does not always add up to 5,000 on account of edits that the database could not process.
You can copy and paste the comma separated text above into a spreadsheet column, and then separate into individual columns by choosing Data/Text to Columns, then choosing comma as the delimiter.
Tentative conclusions as follows.
1. There is a strong, probably 100% correlation between epp below 2 and 'dispersive editing'. By 'dispersive editing' I mean never making more than a few edits to one page. The most extreme example of this is Gaius Cornelius: the most edits to any one page is 3. All the low epp ones so far analysed are like this.
2. There is a strong correlation between average epp > 5, and non-dispersive editing. For example, Wehwalt has maximum edits of 630 to a single page. Likewise, COGDEN, Nev1 and SlimVirgin.
3. (2) above is not to be confused with zero correlation between epp 2-5 and non-dispersive editing. Some editors with a medium epp like Karanacs have non-dispersive editing characteristics, presumably because they are contaminating their content creation with dispersive work.
4. There is an observable, though not strong correlation between non-dispersive article editing, and non-dispersive user page and user talk. I.e. editors who contribute a lot to a small number of articles, also contribute a lot to certain talk pages. Editors who are dispersive, i.e. who do not contribute much to any single article, also do not contribute much to any single user/user talk page. This is either because they just don't talk at all - Gaius Cornelius, edward and Xezbeth hardly contribute to user space at all - or because they do, but in a mechanical way. E.g. WOSlinker contributes mostly 1 edit to over 1,000 user/user talk pages (he is adding some kind of template). These editors rarely contribute to Wikipedia space.
5. Most editors in the sample had a high percentage of contributions to article space. The lowest percentages were: (1) Baseball Bugs. Of the approx 5,000 edits sampled, only 858 to article pages, contrasting with over a 1,000 edits to user pages and 2,894 to Wikipedia pages (mostly ANI). (2) Georgewilliamherbert who has a similar distribution. (3) Giano, whose article space contribution is brought down by article editing in his user space.
6. Certain editors spend a lot of time in Wikipedia space. As noted above, BB and GWH spend a lot of time on ANI. (Indeed, you might ask exactly what they are contributing to the project). Karanacs and SlimVirgin are also high contributors, in SV's case because she spends much time on policy pages.
That's all for now. What am I trying to achieve by this? I am looking for different kinds of evidence for the underlying hypothesis that Wikipedia is not a magically new emergentist phenomenon that came about as the result of Web 2.0. Rather, it's an old-fashioned project that uses management, division of labour into different kinds of task, and various kinds of expertise, to create a product. It even has leaders and management committees, although Wikipedians pretend these are not what they really are.
And I shall go on to argue that, where Wikipedia is strong, it is for all the reasons you would expect (namely, conventional management and organisation). And, where it is weak, it is weak for entirely the resaons you would expect (namely, edit warring, lack of editorial control, inability to recruit good editors).
In fact, the overall argument I want to make is the exact opposite of this one: "The ability for any random person to show up and edit any page at any time, on an equal footing with oldtimers, is the most frightening and appalling way to run a website that I can imagine. It's the secret of our success. --Jimbo "
User,A,U,W,T,A total,U total,W total,T total,Total total,A/U ratio
Gaius Cornelius,3,1,2,0,4990,7,3,0,5000,712.86
J04n,4,6,3,2,4955,16,19,9,4999,309.69
Xezbeth,20,2,1,1,4960,20,7,9,4996,248.00
edward,4,18,0,0,4975,25,0,0,5000,199.00
Merovingian,11,21,2,2,4951,41,2,6,5000,120.76
Andres,19,26,5,1,4933,41,15,1,4990,120.32
JaGa,32,34,29,6,4558,89,185,168,5000,51.21
Neelix,27,29,104,13,4286,237,229,223,4975,18.08
Nev1,451,103,22,6,4520,297,157,17,4991,15.22
David Gerard,66,145,35,10,4217,326,375,34,4952,12.94
Billinghurst,14,65,21,12,4302,368,205,114,4989,11.69
Fram,24,86,46,14,4088,451,394,67,5000,9.06
COGDEN,586,123,97,15,3887,614,398,72,4971,6.33
SlimVirgin,305,217,258,17,3055,583,1181,173,4992,5.24
Slrubenstein,278,52,204,4,3192,847,931,7,4977,3.77
Malleus Fatuorum,256,439,78,4,3460,993,524,5,4982,3.48
Wehwalt,630,273,124,7,3058,1101,735,32,4926,2.78
Snowolf,9,37,20,2,3175,1693,113,11,4992,1.88
WOSlinker,3,23,7,12,2037,1117,129,1710,4993,1.82
Moni3,299,399,42,7,3019,1663,290,19,4991,1.82
Smalljim,34,45,24,9,3104,1741,111,43,4999,1.78
Marine 69-71,138,750,16,21,3088,1807,47,44,4986,1.71
Karanacs,361,305,137,15,2264,1408,1266,60,4998,1.61
Baseball Bugs,55,331,807,1,858,1197,2894,2,4951,0.72
GiacomoReturned,202,727,148,3,1391,2657,732,6,4786,0.52
Georgewilliamherbert,24,316,1156,3,1064,2079,1848,9,5000,0.51
You can copy and paste the comma separated text below into a spreadsheet column, and then separate into individual columns by choosing Data/Text to Columns, then choosing comma as the delimiter.