> Poor Man's Flagged Revisions Fix

Full Version: Poor Man's Flagged Revisions Fix

> Wikimedia Discussion > General Discussion

Milton Roe

The Poor Manâ€™s Flagged-Revision Fix

(Actually, a number of them)

If youâ€™re scraping WP for non-vandalized articles to put into a chip for a book-reader for the dusty children of Africa, or for children anywhere where children canâ€™t afford internet cafes and have no home computer/net access, how do you do it? You canâ€™t just avoid all but semi-protected articlesâ€” WPâ€™s open editor policy has seen to the fact that there arenâ€™t enough of them.

What you CAN do is go back in the history of the article to the last version saved by a nameuser. Even further, you can pick a nameuser who is not redlinked, showing that they at least have edited their talk page. This will miss a few doppelganger accounts like JzGÂ (T-C-L-K-R-D) and others who enjoy having red usernames, but this is not enough to affect statistics. Who really cares about versions JzG has approved anyway?

You could use this technique and still luck into a new nameuser vandal who hasnâ€™t been blocked yet, but just going by non-red nameuser versions does pretty well. Youâ€™ll miss out on a few IP edits, but in any longer article, the odds that the last IP user edits have added anything of lasting value, are small. Even Cluebot knows this, and you can use Cluebotâ€™s technique of which version to revert TO, to figure out which version to look AT.

Of course, itâ€™s possible to do even better, by looking at the same two simple figures-of merit that WP uses to â€œregisterâ€ name users. A name user account requires 5 edits and 4 days (or something). But suppose we set the bar higher, and have it apply to only nameuser accounts which have 300 edits or more, and have been active more than a month? Now, weâ€™ve screened out just about all the simple vandals. The only problem is that in order to easily generate this list of â€œtime/editâ€ trusted users, we have to use some server-time-using tool, to get at these statistics, to check every nameuser we find on a last non-IP version of a Wiki. At this point, we want some kind of look up table of trusted (old and active) nameusers.

At this point, we observe that very high edit counts serve as proxy for minimal account age, since itâ€™s more or less impossible to run up thousands of edits in less than the minimal time weâ€™d like to make sure a vandal nameuser account has been â€œnoticedâ€ and blocked. So one thing we can do immediately to generate a list of â€œtrustedâ€ nameusers, is use the list of editors by edit-count, and taking all of them. These are at:

http://en.wikipedia.org/wiki/Wikipedia:Lis...number_of_edits

This ends at 4000 editors, who all have at least 8933 edits. We can take them all. If the list includes some inactive editors, and editors who have since been blocked for political problems or fighting or socking or whatever, it doesnâ€™t matter. Blocked editors wonâ€™t be ones weâ€™re querying for our latest article version, and even if they are (recently blocked) do we really care that the last version weâ€™re reading is by somebody with 10,000 edits but recently blocked? Is their banning likely to be due to anything having to do with clearly erroneous content? Whatever they did, by definition, is likely to be POV-pushing type subtle, and we can probably stand to look at that.

Another interesting list is the 5000 editors whoâ€™ve made the most edits in the last 30 days, which currently means anybody who has made more than 117 edits in the last month:

http://en.wikipedia.org/wiki/Wikipedia:Lis...of_recent_edits

These also are unlikely to include any nameuser vandal who has made 117 edits (whether in a day, week or month) but hasnâ€™t yet been caught, so all of these are probably useful. Of course, there will be some overlap with the list of editors with the most edits, but probably a large divergence also. Since weâ€™re interested in a pool of editors much larger than is in either of these lists, we want to sum of them, not people who are on both lists (though such an intersection list would presumably generate very active and also super-contributive nameusers).

Note that the present proposals for flagged, patrolled and sighted versions/revisions (none of them quite the same thing, see WP:FLAG), all have the basic problem that nobody can agree on the criteria for an editor to be a reviewer/patroller or trusted editor or article-promoter. See for instance the debate at: http://en.wikipedia.org/wiki/Wikipedia:Reviewers

Worse still, the â€œflagâ€ proposals Iâ€™ve seen make article promotion into a manual time-consuming process, instead of the automatic thing it should be, whenever a nameuser who carries the â€œtrustedâ€ flag, edits and saves a version. This should automatically make it â€œsightedâ€ to whatever specifications we trust THAT editor with. Right? Duh.

Many of these problems could be bypassed, if WP simply kept weekly track (no more often than weekly, is necessary) for two things for each editor: 1) total time in months since registering, and 2) total edits in blocks of 100, to the nearest 100. For editors who edit more than once a week, this could be done any time in the week that server time is available, and put somewhere in the file associated with the username (perhaps the same one that contains the password). As such, it would not be subject to manipulation. For editors who have not edited for more than a week, the update could be done on the spot, at their next edit.

It would remain for MediaWikiâ€™s software to simply append these two numbers, in parentheses, after the username, whenever a user edits and saves any page. Thus, if you see in the history of a page that user:JoeBlow(24;200) has saved a page, you know that JoeBlow has been a user for 24 months and has 20,000 edits. And again, for our purposes, it DOES NOT MATTER if the figures arenâ€™t absolutely up to date or accurate.

Once this is done, we can do something remarkable. Not only can the entire project automatically set a â€œsightedâ€ floor for which editors can flag an article by by the mere act of saving it, but this â€œfloorâ€ can be easily changed at any time, to get the best outcome.

Moreover, if the project as a whole cannot agree on the â€œsightingâ€ limits for editors (which seems likely), that doesnâ€™t matter, either! Once these merit-numbers are associated with all nameusers, the reader/user can set the numbers to any value in their user-preferences. That is, they can, they even prefer to read the last sighted version, rather than the last raw version. (Of course, even if youâ€™re reading sighted versions, you can always bring up the last raw version in the history, if you want to see it; and it will automatically come up if you edit).

Thus, if a reader wants to see just the last raw version, as now, he or she can set their preferences that way. If they want to see only versions of articles which have been saved by editors with (say) at least 6 months of wiki-experience and at least 1000 edits, they can do that, also. And change these personal threshholds at any time.

Note: yes, I know that far more complicated Taj-Mahal proposals have been name by MediaWiki lab people. One is that each nameuser carry around a figure of merit which encodes their sum total of bytes changed on WP, multiplied by the time each byte-change has lasted (or lasted till removed). This will indeed be the gold standard of content contribution. And it has been envisioned that this number can be used to change the shade of orange that each nameuserâ€™s changes to an article appear in!

Thus, if you like, you can see that the darker a word, the more content the user who added it, has contributed.

We should all live so long. Before the millennial vision of this arrives, perhaps we should try some of the simpler solutions outlined above. Most or all of them have the virtue of being easy to implement in software, and also that we donâ€™t have to have community agreement on any of the â€œtrustâ€ thresholds for readers. Rather, each reader chooses for themselves. A horrid idea, no?

Perennially Proposing Milton

Sarcasticidealist

QUOTE(Milton Roe @ Thu 15th October 2009, 12:28am)

Worse still, the â€œflagâ€ proposals Iâ€™ve seen make article promotion into a manual time-consuming process, instead of the automatic thing it should be, whenever a nameuser who carries the â€œtrustedâ€ flag, edits and saves a version. This should automatically make it â€œsightedâ€ to whatever specifications we trust THAT editor with. Right? Duh.

That's not necessarily true. I certainly don't scan every article in which I insert a wikilink to make sure that it's free of anything that shouldn't go live, though I suppose the knowledge that my insertion of a wikilink would make it go live might change that. Or it might cause me to stop inserting wikilinks. Or it might cause me to register an alternative, untrusted, account to do my gnoming.

thekohser

Milton, with the kinetic energy it took you to write that, you could have powered an electric generator that would have provided the poor African girl with enough AC to log into Wikipedia for three days!

Problem with your theory: no real difference between unproductive editors who don't have their facts straight, and highly prolific editors who think they're helping make facts "better" by tidying up here and there.

Mike Ilitch was born in Detroit, not southeastern Europe.

Milton Roe

QUOTE(Sarcasticidealist @ Thu 15th October 2009, 8:17am)

QUOTE(Milton Roe @ Thu 15th October 2009, 12:28am)

Good point, and any of these are acceptable. Another obvious "fix" feature would be to arrange things so that saving of a section (what you ordinarily do when "gnoming") would NOT automatically "sighted flag" the article-- only saving the entire thing would do that. And of course, that means we'd need a section header for the LEAD/LEDE, since right now any work on the LEDE "section" requires you to save the whole article, when generally you really would rather not. (That could be fixed NOW, since it's a dumb feature of the Wiki software).

Milton Roe

QUOTE(thekohser @ Thu 15th October 2009, 9:52am)

Sure, but I'm teaching how to fish, not making fish.

QUOTE(Kohser)

Problem with your theory: no real difference between unproductive editors who don't have their facts straight, and highly prolific editors who think they're helping make facts "better" by tidying up here and there.

Mike Ilitch was born in Detroit, not southeastern Europe.

Kvetch, Kvetch. I propose a fix for the handling/steering problem in a car and there you are, complaining that my solution does nothing about the mileage it gets, or the crappy paint colors it comes in.

I am proposing a system to stop vandalism. The problem of subject matter expertise (SME) is a completely different one. It requires yet another "SME flag" when vetted for SME. This is the routine way we do it in the REAL world of science and technical writing, so don't tell me it's unworkable and unwieldy and unrealistic. I'm merely describing what we already do in publishing, outside Wikipedia.

The reasons for this division of labor go far beyond the fact that fixing spelling, grammar, punctuation, syntax and organization are quite different things than fact-checking. It's also necessary because SMEs and fact-checkers are a far rarer breed (particularly in some classes of article), and will be available to perform their jobs, and affix their particular flags, at far fewer intervals. Their abilities/qualifications are much more difficult to keep tabs on, also. We all recognize gratuitous profanity and unencyclopedic tone, and know how to check if a word is misspelled; and any decent writer can recognize a badly constructed sentence or paragraph. An error in history is harder to catch, and an error in comprehension for an arcane subject (particularly if mathematics is involved) is even harder.

WP conflates all these jobs-- that's one of its major problems!

Separation of types of flags and tags would at least help it get back to the road of realizing that old and productive editors can be trusted to catch most kinds of vandalism, but their "imprimatur" means little regarding the matter of correspondance to academically accepted reality. For that, you need another fix. But at least, in the meantime, you've fixed ONE major type of problem.

And again, the Best is ever the enemy of the Good.

Lar

QUOTE(Milton Roe @ Thu 15th October 2009, 1:33pm)

... And of course, that means we'd need a section header for the LEAD/LEDE, since right now any work on the LEDE "section" requires you to save the whole article, when generally you really would rather not. (That could be fixed NOW, since it's a dumb feature of the Wiki software).

Just edit section 0, I think. Lots of .js out there to insert a link to do that. I think it might even be a gadget. But yes, seems a misfeature to me.

Malleus

QUOTE(Lar @ Thu 15th October 2009, 6:57pm)

QUOTE(Milton Roe @ Thu 15th October 2009, 1:33pm)

Just edit section 0, I think. Lots of .js out there to insert a link to do that. I think it might even be a gadget. But yes, seems a misfeature to me.

It is a gadget, yes. I hardly ever edit the whole article, except when I'm moving stuff from one section to another.

Milton Roe

QUOTE(Lar @ Thu 15th October 2009, 10:57am)

Just edit section 0, I think.

Er, never use the phrase "I think" in conjunction with the previous action modifier "just."

If it's obvious how to do it, you'll know how to do it. I can't figure out how. If you get an algorithm, I'd appreciate detail.

So would every other WP editor.

Lar

QUOTE(Milton Roe @ Thu 15th October 2009, 4:55pm)

QUOTE(Lar @ Thu 15th October 2009, 10:57am)

Just edit section 0, I think.

Er, never use the phrase "I think" in conjunction with the previous action modifier "just."

If it's obvious how to do it, you'll know how to do it. I can't figure out how. If you get an algorithm, I'd appreciate detail.

So would every other WP editor.

tack section=0 on the end of the URL. That is:

http://en.wikipedia.org/w/index.php?title=...=edit§ion=0
instead of
http://en.wikipedia.org/w/index.php?title=...Lar&action=edit

The gadget is the fourth one down in the UI section

" Add an [edit] link for the lead section of a page"

Go to your prefs and turn it on. If it doesn't work, dump your monobook.js temporarily to see if it's something about your javascript (change to whatever skin you use of course if you don't use monobook)

You seriously didn't know all this already and weren't trolling to see if I'd regurgitate it on demand?

Hope that helps.

Milton Roe

QUOTE(Lar @ Thu 15th October 2009, 2:11pm)

QUOTE(Milton Roe @ Thu 15th October 2009, 4:55pm)

QUOTE(Lar @ Thu 15th October 2009, 10:57am)

Just edit section 0, I think.

Er, never use the phrase "I think" in conjunction with the previous action modifier "just."

If it's obvious how to do it, you'll know how to do it. I can't figure out how. If you get an algorithm, I'd appreciate detail.

So would every other WP editor.

tack section=0 on the end of the URL. That is:

http://en.wikipedia.org/w/index.php?title=...=editÂ§ion=0
instead of
http://en.wikipedia.org/w/index.php?title=...Lar&action=edit

The gadget is the fourth one down in the UI section

" Add an [edit] link for the lead section of a page"

Go to your prefs and turn it on. If it doesn't work, dump your monobook.js temporarily to see if it's something about your javascript (change to whatever skin you use of course if you don't use monobook)

You seriously didn't know all this already and weren't trolling to see if I'd regurgitate it on demand?

Hope that helps.

It does. No, I seriously didn't know it. Which is amazing, since it seems so obvious when you list it like that.

Say, is this one of those things that everybody else knows, but I somehow missed?

Like not starting with hot water when you boil eggs?

Or not to use liquid dishwashing soap in the dishwasher?

Malleus

QUOTE(Milton Roe @ Thu 15th October 2009, 10:19pm)

Say, is this one of those things that everybody else knows, but I somehow missed?

Yes.

Milton Roe

QUOTE(Malleus @ Thu 15th October 2009, 3:24pm)

QUOTE(Milton Roe @ Thu 15th October 2009, 10:19pm)

Say, is this one of those things that everybody else knows, but I somehow missed?

Yes.

Well, I'm embarrassed. I can see the section number in the http address for other sections when I click edit or refer to them, but who'd have guessed they'd have a section ZERO?? I suppose if computer wonks had done the calender, the first century A.D. would started with year 0 A.D. Which would definitely have screwed up Jesus' driver license. Y2K problems would have been nothing by comparison.

In Japan, where everything has a number, their first giant monster is not Monster 1, but Monster Zero.

So you see, it all makes sense.

Moulton

Shouldn't monsters be numbered on the imaginary axis?

Lar

QUOTE(Moulton @ Thu 15th October 2009, 6:41pm)

Shouldn't monsters be numbered on the imaginary axis?

Zero is a number on the imaginary axis, isn't it?

QUOTE(Milton Roe @ Thu 15th October 2009, 6:38pm)

In Japan, where everything has a number, their first giant monster is not Monster 1, but Monster Zero.

So you see, it all makes sense.

You watch too much anime, apparently.

Also, wouldn't the default assumption about any given piece of code be that it was written by a computer programmer (not in the tautological sense, but in the real sense) (i). So of course it's numbered starting at 0.

i - not to be confused with the imaginary sense. This is complex!

oh...

This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.