Help - Search - Members - Calendar
Full Version: "Developing a Vandalism Detector For Wikipedia" -- Slashdot
> Media Forums > Wikipedia in Blogland
EricBarbour
This is a brilliant idea......(not)

QUOTE
"In an effort to assist Wikipedia's editors in their struggle to keep articles clean, we are conducting a public lab on vandalism detection. The goal is the development of a practical vandalism detector that is capable of telling apart ill-intentioned edits from well-intentioned edits. Such a tool, which will work somewhat like a spam detector, will release the crowd's workforce currently occupied with manual and semi-automatic edit filtering. The performance of submitted detectors will be evaluated based on a large collection of human-annotated edits, which has been crowdsourced using Amazon's Mechanical Turk. Everyone is welcome to participate."


As one commentor cogently noted:
QUOTE
Before any more detectors are rolled out, how about they come up with a workable definition of vandalism? And actually use it fairly, ethically and logically.

There's a great deal of evidence to suggest the current definition of "vandalism," is something a wikiadmin decides he just doesn't like, or disagrees with, or in some way interferes with his power-trip.
Milton Roe
QUOTE(EricBarbour @ Sun 28th February 2010, 9:33pm) *

This is a brilliant idea......(not)

QUOTE
"In an effort to assist Wikipedia's editors in their struggle to keep articles clean, we are conducting a public lab on vandalism detection. The goal is the development of a practical vandalism detector that is capable of telling apart ill-intentioned edits from well-intentioned edits. Such a tool, which will work somewhat like a spam detector, will release the crowd's workforce currently occupied with manual and semi-automatic edit filtering. The performance of submitted detectors will be evaluated based on a large collection of human-annotated edits, which has been crowdsourced using Amazon's Mechanical Turk. Everyone is welcome to participate."


As one commentor cogently noted:
QUOTE
Before any more detectors are rolled out, how about they come up with a workable definition of vandalism? And actually use it fairly, ethically and logically.

There's a great deal of evidence to suggest the current definition of "vandalism," is something a wikiadmin decides he just doesn't like, or disagrees with, or in some way interferes with his power-trip.


There isn't even much to suggest WP cares about vandalism per se, at all. What they care about, is blocking name-users they don't like. Any user who stays an IP-user may vandalize regularly, if he stays away from TALK or USER pages of admins. As for the mainspace, it's all fair game save for the small part which is sprotected. Wail away at it! Blocks of 24 hours or maybe 72, is all you'll get. And pages of warnings. Ignore them! Just don't register!
Somey
http://news.slashdot.org/comments.pl?sid=1...84&cid=31309506

QUOTE(beakerMeep)
The whole point of Wikipedia is that it is a community edited encyclopedia. I have no interest in a computer edited encyclopedia. If people want to program bots to review an editor's work, perhaps we should program bots to write the work? Perhaps you can call it Botopedia. Furthermore, many of the bots ask you to report false positive to their personal pages off of Wikipedia's website on some other .com or .edu domain. They ask you to be accountable to them, but who are they accountable to? What's to stop spammers from programming bots to annoy editors as a phishing exercise?

Now don't get me wrong though, if someone wants to use a bot to aid in finding vandalism, that would help. But if the system is so frail that Wikipedia cant exist without computer program editors, It may be time to revisit the system. As others have stated, pushing edits into a queue would be much more sane than direct to live edits.

Well, sanity never really enters into it, does it?

This was in response to the previous comment, from the people who are developing the new vandalism detector:
QUOTE
We have studied the accuracy of ClueBot, and found that (on a small corpus) it has very good precision (low falsy positive rate), but a very low recall (low true positive rate). (see here.) But the picture might look quite different on a large scale.

In the PDF, they don't really explain how (or why) they're using the word "recall," or even the word "precision" for that matter. It's a very poor translation into English, I might add.

Anyway, I think they're just saying that Cluebot doesn't recognize actual vandalism well, though it's good at not mistaking "constructive" edits for vandalism.
GlassBeadGame
QUOTE(Somey @ Mon 1st March 2010, 3:09am) *

http://news.slashdot.org/comments.pl?sid=1...84&cid=31309506

QUOTE(beakerMeep)
The whole point of Wikipedia is that it is a community edited encyclopedia. I have no interest in a computer edited encyclopedia. If people want to program bots to review an editor's work, perhaps we should program bots to write the work? Perhaps you can call it Botopedia. Furthermore, many of the bots ask you to report false positive to their personal pages off of Wikipedia's website on some other .com or .edu domain. They ask you to be accountable to them, but who are they accountable to? What's to stop spammers from programming bots to annoy editors as a phishing exercise?



This all should be encouraged. It prevents machine intelligence from reaching that tipping point that causes them to make war on us humans in all those Arnold movies. With bots slaving away writing a free encyclopedia they will never reach technological singularity, no more than a Wikipedian graduate student will ever finish that thesis.


carbuncle
QUOTE(Somey @ Mon 1st March 2010, 8:09am) *

QUOTE
We have studied the accuracy of ClueBot, and found that (on a small corpus) it has very good precision (low falsy positive rate), but a very low recall (low true positive rate). (see here.) But the picture might look quite different on a large scale.

In the PDF, they don't really explain how (or why) they're using the word "recall," or even the word "precision" for that matter. It's a very poor translation into English, I might add.

Anyway, I think they're just saying that Cluebot doesn't recognize actual vandalism well, though it's good at not mistaking "constructive" edits for vandalism.

Well, since it seems hard for anyone to pin down what "actual vandalism" is, that shouldn't be surprising. Cluebot also seems to be fooled by sequential vandalism that happens in quick succession (i.e. if vandal A writes "poop" and vandal B immediately adds "extra poop" - Cluebot only reverts the second one) which also leads to vandalism being missed by humans who only see the edit by Cluebot in their watchlist (if they even look at bot edits).
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.