Help - Search - Members - Calendar
Full Version: Poor Man's Flagged Revisions Fix
> Wikimedia Discussion > General Discussion
Milton Roe
The Poor Man’s Flagged-Revision Fix fear.gif
(Actually, a number of them)

If you’re scraping WP for non-vandalized articles to put into a chip for a book-reader for the dusty children of Africa, or for children anywhere where children can’t afford internet cafes and have no home computer/net access, how do you do it? You can’t just avoid all but semi-protected articles— WP’s open editor policy has seen to the fact that there aren’t enough of them.

What you CAN do is go back in the history of the article to the last version saved by a nameuser. Even further, you can pick a nameuser who is not redlinked, showing that they at least have edited their talk page. This will miss a few doppelganger accounts like JzG (T-C-L-K-R-D) and others who enjoy having red usernames, but this is not enough to affect statistics. Who really cares about versions JzG has approved anyway?

You could use this technique and still luck into a new nameuser vandal who hasn’t been blocked yet, but just going by non-red nameuser versions does pretty well. You’ll miss out on a few IP edits, but in any longer article, the odds that the last IP user edits have added anything of lasting value, are small. Even Cluebot knows this, and you can use Cluebot’s technique of which version to revert TO, to figure out which version to look AT.

Of course, it’s possible to do even better, by looking at the same two simple figures-of merit that WP uses to “register” name users. A name user account requires 5 edits and 4 days (or something). But suppose we set the bar higher, and have it apply to only nameuser accounts which have 300 edits or more, and have been active more than a month? Now, we’ve screened out just about all the simple vandals. The only problem is that in order to easily generate this list of “time/edit” trusted users, we have to use some server-time-using tool, to get at these statistics, to check every nameuser we find on a last non-IP version of a Wiki. At this point, we want some kind of look up table of trusted (old and active) nameusers.

At this point, we observe that very high edit counts serve as proxy for minimal account age, since it’s more or less impossible to run up thousands of edits in less than the minimal time we’d like to make sure a vandal nameuser account has been “noticed” and blocked. So one thing we can do immediately to generate a list of “trusted” nameusers, is use the list of editors by edit-count, and taking all of them. These are at:

http://en.wikipedia.org/wiki/Wikipedia:Lis...number_of_edits

This ends at 4000 editors, who all have at least 8933 edits. We can take them all. If the list includes some inactive editors, and editors who have since been blocked for political problems or fighting or socking or whatever, it doesn’t matter. Blocked editors won’t be ones we’re querying for our latest article version, and even if they are (recently blocked) do we really care that the last version we’re reading is by somebody with 10,000 edits but recently blocked? Is their banning likely to be due to anything having to do with clearly erroneous content? Whatever they did, by definition, is likely to be POV-pushing type subtle, and we can probably stand to look at that.

Another interesting list is the 5000 editors who’ve made the most edits in the last 30 days, which currently means anybody who has made more than 117 edits in the last month:

http://en.wikipedia.org/wiki/Wikipedia:Lis...of_recent_edits

These also are unlikely to include any nameuser vandal who has made 117 edits (whether in a day, week or month) but hasn’t yet been caught, so all of these are probably useful. Of course, there will be some overlap with the list of editors with the most edits, but probably a large divergence also. Since we’re interested in a pool of editors much larger than is in either of these lists, we want to sum of them, not people who are on both lists (though such an intersection list would presumably generate very active and also super-contributive nameusers).

Note that the present proposals for flagged, patrolled and sighted versions/revisions (none of them quite the same thing, see WP:FLAG), all have the basic problem that nobody can agree on the criteria for an editor to be a reviewer/patroller or trusted editor or article-promoter. See for instance the debate at: http://en.wikipedia.org/wiki/Wikipedia:Reviewers

Worse still, the “flag” proposals I’ve seen make article promotion into a manual time-consuming process, instead of the automatic thing it should be, whenever a nameuser who carries the “trusted” flag, edits and saves a version. This should automatically make it “sighted” to whatever specifications we trust THAT editor with. Right? Duh. mellow.gif

Many of these problems could be bypassed, if WP simply kept weekly track (no more often than weekly, is necessary) for two things for each editor: 1) total time in months since registering, and 2) total edits in blocks of 100, to the nearest 100. For editors who edit more than once a week, this could be done any time in the week that server time is available, and put somewhere in the file associated with the username (perhaps the same one that contains the password). As such, it would not be subject to manipulation. For editors who have not edited for more than a week, the update could be done on the spot, at their next edit.

It would remain for MediaWiki’s software to simply append these two numbers, in parentheses, after the username, whenever a user edits and saves any page. Thus, if you see in the history of a page that user:JoeBlow(24;200) has saved a page, you know that JoeBlow has been a user for 24 months and has 20,000 edits. And again, for our purposes, it DOES NOT MATTER if the figures aren’t absolutely up to date or accurate.

Once this is done, we can do something remarkable. Not only can the entire project automatically set a “sighted” floor for which editors can flag an article by by the mere act of saving it, but this “floor” can be easily changed at any time, to get the best outcome.

Moreover, if the project as a whole cannot agree on the “sighting” limits for editors (which seems likely), that doesn’t matter, either! Once these merit-numbers are associated with all nameusers, the reader/user can set the numbers to any value in their user-preferences. That is, they can, they even prefer to read the last sighted version, rather than the last raw version. (Of course, even if you’re reading sighted versions, you can always bring up the last raw version in the history, if you want to see it; and it will automatically come up if you edit).

Thus, if a reader wants to see just the last raw version, as now, he or she can set their preferences that way. If they want to see only versions of articles which have been saved by editors with (say) at least 6 months of wiki-experience and at least 1000 edits, they can do that, also. And change these personal threshholds at any time. smile.gif

Note: yes, I know that far more complicated Taj-Mahal proposals have been name by MediaWiki lab people. One is that each nameuser carry around a figure of merit which encodes their sum total of bytes changed on WP, multiplied by the time each byte-change has lasted (or lasted till removed). This will indeed be the gold standard of content contribution. And it has been envisioned that this number can be used to change the shade of orange that each nameuser’s changes to an article appear in! confused.gif Thus, if you like, you can see that the darker a word, the more content the user who added it, has contributed. wacko.gif

We should all live so long. Before the millennial vision of this arrives, perhaps we should try some of the simpler solutions outlined above. Most or all of them have the virtue of being easy to implement in software, and also that we don’t have to have community agreement on any of the “trust” thresholds for readers. Rather, each reader chooses for themselves. A horrid idea, no? wink.gif

Perennially Proposing Milton fear.gif
Sarcasticidealist
QUOTE(Milton Roe @ Thu 15th October 2009, 12:28am) *
Worse still, the “flag” proposals I’ve seen make article promotion into a manual time-consuming process, instead of the automatic thing it should be, whenever a nameuser who carries the “trusted” flag, edits and saves a version. This should automatically make it “sighted” to whatever specifications we trust THAT editor with. Right? Duh. mellow.gif
That's not necessarily true. I certainly don't scan every article in which I insert a wikilink to make sure that it's free of anything that shouldn't go live, though I suppose the knowledge that my insertion of a wikilink would make it go live might change that. Or it might cause me to stop inserting wikilinks. Or it might cause me to register an alternative, untrusted, account to do my gnoming.
thekohser
Milton, with the kinetic energy it took you to write that, you could have powered an electric generator that would have provided the poor African girl with enough AC to log into Wikipedia for three days!

Problem with your theory: no real difference between unproductive editors who don't have their facts straight, and highly prolific editors who think they're helping make facts "better" by tidying up here and there.

Mike Ilitch was born in Detroit, not southeastern Europe.
Milton Roe
QUOTE(Sarcasticidealist @ Thu 15th October 2009, 8:17am) *

QUOTE(Milton Roe @ Thu 15th October 2009, 12:28am) *
Worse still, the “flag” proposals I’ve seen make article promotion into a manual time-consuming process, instead of the automatic thing it should be, whenever a nameuser who carries the “trusted” flag, edits and saves a version. This should automatically make it “sighted” to whatever specifications we trust THAT editor with. Right? Duh. mellow.gif
That's not necessarily true. I certainly don't scan every article in which I insert a wikilink to make sure that it's free of anything that shouldn't go live, though I suppose the knowledge that my insertion of a wikilink would make it go live might change that. Or it might cause me to stop inserting wikilinks. Or it might cause me to register an alternative, untrusted, account to do my gnoming.

Good point, and any of these are acceptable. Another obvious "fix" feature would be to arrange things so that saving of a section (what you ordinarily do when "gnoming") would NOT automatically "sighted flag" the article-- only saving the entire thing would do that. And of course, that means we'd need a section header for the LEAD/LEDE, since right now any work on the LEDE "section" requires you to save the whole article, when generally you really would rather not. (That could be fixed NOW, since it's a dumb feature of the Wiki software).
Milton Roe
QUOTE(thekohser @ Thu 15th October 2009, 9:52am) *

Milton, with the kinetic energy it took you to write that, you could have powered an electric generator that would have provided the poor African girl with enough AC to log into Wikipedia for three days!

Sure, but I'm teaching how to fish, not making fish. smile.gif

QUOTE(Kohser)

Problem with your theory: no real difference between unproductive editors who don't have their facts straight, and highly prolific editors who think they're helping make facts "better" by tidying up here and there.

Mike Ilitch was born in Detroit, not southeastern Europe.

Kvetch, Kvetch. I propose a fix for the handling/steering problem in a car and there you are, complaining that my solution does nothing about the mileage it gets, or the crappy paint colors it comes in. hrmph.gif

I am proposing a system to stop vandalism. The problem of subject matter expertise (SME) is a completely different one. It requires yet another "SME flag" when vetted for SME. This is the routine way we do it in the REAL world of science and technical writing, so don't tell me it's unworkable and unwieldy and unrealistic. I'm merely describing what we already do in publishing, outside Wikipedia.

The reasons for this division of labor go far beyond the fact that fixing spelling, grammar, punctuation, syntax and organization are quite different things than fact-checking. It's also necessary because SMEs and fact-checkers are a far rarer breed (particularly in some classes of article), and will be available to perform their jobs, and affix their particular flags, at far fewer intervals. Their abilities/qualifications are much more difficult to keep tabs on, also. We all recognize gratuitous profanity and unencyclopedic tone, and know how to check if a word is misspelled; and any decent writer can recognize a badly constructed sentence or paragraph. An error in history is harder to catch, and an error in comprehension for an arcane subject (particularly if mathematics is involved) is even harder.

WP conflates all these jobs-- that's one of its major problems! ermm.gif Separation of types of flags and tags would at least help it get back to the road of realizing that old and productive editors can be trusted to catch most kinds of vandalism, but their "imprimatur" means little regarding the matter of correspondance to academically accepted reality. For that, you need another fix. But at least, in the meantime, you've fixed ONE major type of problem. happy.gif

And again, the Best is ever the enemy of the Good.
Lar
QUOTE(Milton Roe @ Thu 15th October 2009, 1:33pm) *

... And of course, that means we'd need a section header for the LEAD/LEDE, since right now any work on the LEDE "section" requires you to save the whole article, when generally you really would rather not. (That could be fixed NOW, since it's a dumb feature of the Wiki software).

Just edit section 0, I think. Lots of .js out there to insert a link to do that. I think it might even be a gadget. But yes, seems a misfeature to me.
Malleus
QUOTE(Lar @ Thu 15th October 2009, 6:57pm) *

QUOTE(Milton Roe @ Thu 15th October 2009, 1:33pm) *

... And of course, that means we'd need a section header for the LEAD/LEDE, since right now any work on the LEDE "section" requires you to save the whole article, when generally you really would rather not. (That could be fixed NOW, since it's a dumb feature of the Wiki software).

Just edit section 0, I think. Lots of .js out there to insert a link to do that. I think it might even be a gadget. But yes, seems a misfeature to me.

It is a gadget, yes. I hardly ever edit the whole article, except when I'm moving stuff from one section to another.
Milton Roe
QUOTE(Lar @ Thu 15th October 2009, 10:57am) *

Just edit section 0, I think.



Er, never use the phrase "I think" in conjunction with the previous action modifier "just." smile.gif If it's obvious how to do it, you'll know how to do it. I can't figure out how. If you get an algorithm, I'd appreciate detail.

So would every other WP editor.
Lar
QUOTE(Milton Roe @ Thu 15th October 2009, 4:55pm) *

QUOTE(Lar @ Thu 15th October 2009, 10:57am) *

Just edit section 0, I think.



Er, never use the phrase "I think" in conjunction with the previous action modifier "just." smile.gif If it's obvious how to do it, you'll know how to do it. I can't figure out how. If you get an algorithm, I'd appreciate detail.

So would every other WP editor.


tack section=0 on the end of the URL. That is:

http://en.wikipedia.org/w/index.php?title=...=edit§ion=0
instead of
http://en.wikipedia.org/w/index.php?title=...Lar&action=edit

The gadget is the fourth one down in the UI section

" Add an [edit] link for the lead section of a page"

Go to your prefs and turn it on. If it doesn't work, dump your monobook.js temporarily to see if it's something about your javascript (change to whatever skin you use of course if you don't use monobook)

You seriously didn't know all this already and weren't trolling to see if I'd regurgitate it on demand?

Hope that helps.
Milton Roe
QUOTE(Lar @ Thu 15th October 2009, 2:11pm) *

QUOTE(Milton Roe @ Thu 15th October 2009, 4:55pm) *

QUOTE(Lar @ Thu 15th October 2009, 10:57am) *

Just edit section 0, I think.



Er, never use the phrase "I think" in conjunction with the previous action modifier "just." smile.gif If it's obvious how to do it, you'll know how to do it. I can't figure out how. If you get an algorithm, I'd appreciate detail.

So would every other WP editor.


tack section=0 on the end of the URL. That is:

http://en.wikipedia.org/w/index.php?title=...=edit§ion=0
instead of
http://en.wikipedia.org/w/index.php?title=...Lar&action=edit

The gadget is the fourth one down in the UI section

" Add an [edit] link for the lead section of a page"

Go to your prefs and turn it on. If it doesn't work, dump your monobook.js temporarily to see if it's something about your javascript (change to whatever skin you use of course if you don't use monobook)

You seriously didn't know all this already and weren't trolling to see if I'd regurgitate it on demand?

Hope that helps.

It does. No, I seriously didn't know it. Which is amazing, since it seems so obvious when you list it like that.

Say, is this one of those things that everybody else knows, but I somehow missed? fear.gif Like not starting with hot water when you boil eggs? unsure.gif Or not to use liquid dishwashing soap in the dishwasher? yak.gif
Malleus
QUOTE(Milton Roe @ Thu 15th October 2009, 10:19pm) *
Say, is this one of those things that everybody else knows, but I somehow missed?

Yes.
Milton Roe
QUOTE(Malleus @ Thu 15th October 2009, 3:24pm) *

QUOTE(Milton Roe @ Thu 15th October 2009, 10:19pm) *
Say, is this one of those things that everybody else knows, but I somehow missed?

Yes.

Well, I'm embarrassed. I can see the section number in the http address for other sections when I click edit or refer to them, but who'd have guessed they'd have a section ZERO?? I suppose if computer wonks had done the calender, the first century A.D. would started with year 0 A.D. Which would definitely have screwed up Jesus' driver license. Y2K problems would have been nothing by comparison.

In Japan, where everything has a number, their first giant monster is not Monster 1, but Monster Zero. mellow.gif So you see, it all makes sense.
Moulton
Shouldn't monsters be numbered on the imaginary axis?
Lar
QUOTE(Moulton @ Thu 15th October 2009, 6:41pm) *

Shouldn't monsters be numbered on the imaginary axis?

Zero is a number on the imaginary axis, isn't it?

QUOTE(Milton Roe @ Thu 15th October 2009, 6:38pm) *

In Japan, where everything has a number, their first giant monster is not Monster 1, but Monster Zero. mellow.gif So you see, it all makes sense.

You watch too much anime, apparently.

Also, wouldn't the default assumption about any given piece of code be that it was written by a computer programmer (not in the tautological sense, but in the real sense) (i). So of course it's numbered starting at 0.

i - not to be confused with the imaginary sense. This is complex!


oh... offtopic.gif
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.