Help - Search - Members - Calendar
Full Version: Oversight logs
> Wikimedia Discussion > General Discussion
tarantino
For the last 6 months the toolserver has been hosting the logs of all oversights performed on WP prior to January 2008. It's in the public home directory of Daniel Erenrich, who's partly responsible for the Wikiscanner project at Cal Tech. There's some interesting stuff in there.
I have a spreadsheet of the data here.
privatemusings
presumably this is a bungle on somebody's part? - would I be correct in assuming that generally speaking these logs would be private? - I guess logs from now on will be?

I'm browsing now, but am probably a bit too stupid to figure out quite what the data means... I see the infamous FT2 oversights are there though....
thekohser
I'll bet the oversighted contributions of User:Belginusanl to the "Child" article were interesting data for perhaps law enforcement. Here's an edit of his that didn't get oversighted, which gives some indication that this guy is a collector of photographs of children.
Werdna648
For the record, this did not make use of any toolserver-only resources. It was a heuristic script to figure out what had *probably* been oversighted.

His script could just as easily been run against the live site, and got the same results.
Peter Damian
QUOTE(thekohser @ Fri 26th June 2009, 12:41pm) *

I'll bet the oversighted contributions of User:Belginusanl to the "Child" article were interesting data for perhaps law enforcement. Here's an edit of his that didn't get oversighted, which gives some indication that this guy is a collector of photographs of children.


This user is quite disturbing.

http://en.wikipedia.org/wiki/Wikipedia:Sus...ets/Belginusanl

Seems to have a fascination with child abduction as well as naming individual children in photographs

And still apparently contributing happily!

http://en.wikipedia.org/wiki/Special:Contr.../24.192.149.137
anthony
QUOTE(Werdna648 @ Fri 26th June 2009, 12:16pm) *

For the record, this did not make use of any toolserver-only resources. It was a heuristic script to figure out what had *probably* been oversighted.

His script could just as easily been run against the live site, and got the same results.


Did it use information on what was deleted? If not, how could you figure out what was oversighted vs. what was deleted?
MZMcBride
QUOTE(anthony @ Fri 26th June 2009, 9:55am) *

QUOTE(Werdna648 @ Fri 26th June 2009, 12:16pm) *

For the record, this did not make use of any toolserver-only resources. It was a heuristic script to figure out what had *probably* been oversighted.

His script could just as easily been run against the live site, and got the same results.


Did it use information on what was deleted? If not, how could you figure out what was oversighted vs. what was deleted?


My guess is that a script looked through dumps (or possibly even the RecentChanges feed) and compared against the live site at fixed points. For any discrepancies, it saved the rows to a file.

Not sure if or how well this would work for edits that only sat around exposed for a very short period of time. However, for the edits that were removed months or years later, it's a pretty easy way to generate a list like this.
MBisanz
QUOTE(MZMcBride @ Fri 26th June 2009, 3:44pm) *

QUOTE(anthony @ Fri 26th June 2009, 9:55am) *

QUOTE(Werdna648 @ Fri 26th June 2009, 12:16pm) *

For the record, this did not make use of any toolserver-only resources. It was a heuristic script to figure out what had *probably* been oversighted.

His script could just as easily been run against the live site, and got the same results.


Did it use information on what was deleted? If not, how could you figure out what was oversighted vs. what was deleted?


My guess is that a script looked through dumps (or possibly even the RecentChanges feed) and compared against the live site at fixed points. For any discrepancies, it saved the rows to a file.

Not sure if or how well this would work for edits that only sat around exposed for a very short period of time. However, for the edits that were removed months or years later, it's a pretty easy way to generate a list like this.


Seeing as there are no File talk: pages on the list and I've requested about a dozen oversights on File talk: pages when patrolling RecentChanges, I suspect it requires a lengthy period of time between the edit and oversight.
tarantino
QUOTE(Werdna648 @ Fri 26th June 2009, 12:16pm) *

For the record, this did not make use of any toolserver-only resources. It was a heuristic script to figure out what had *probably* been oversighted.

His script could just as easily been run against the live site, and got the same results.


Are you speaking in your official capacity as a WMF staff member?

It's very plausible that he compared a meta-history dump to the database or even compared two dumps. Then he could check the logs for for every page that had missing edits to see if there were any admin deletions. The pages that didn't have admin deletions must then have been oversighted.
emesee
nothing to see here.

move along.
CharlotteWebb
QUOTE(tarantino @ Fri 26th June 2009, 7:41pm) *

It's very plausible that he compared a meta-history dump to the database or even compared two dumps. Then he could check the logs for for every page that had missing edits to see if there were any admin deletions. The pages that didn't have admin deletions must then have been oversighted.

If this were true one would expect to see (or rather not to see) a number of false negatives on this list.

That is, a page known to have had (different) edits removed both by oversight and later by normal deletion would not be expected to appear on this list because such a script as you describe could not know which edits were oversighted and which ones were normally deleted.

Yet this one does seem to know.
CharlotteWebb
Someone else can confirm this at their leisure but I'll bet dollars to donuts that the two unmarked columns in your spreadsheet (containing only 1's and 0's) indicate:
*Whether the oversighted edit was marked as "minor"
*Whether it was already deleted at the time of the oversight.

Survey says inside job.
anthony
QUOTE(CharlotteWebb @ Sat 27th June 2009, 8:06am) *

Someone else can confirm this at their leisure but I'll bet dollars to donuts that the two unmarked columns in your spreadsheet (containing only 1's and 0's) indicate:
*Whether the oversighted edit was marked as "minor"
*Whether it was already deleted at the time of the oversight.

Survey says inside job.


It certainly couldn't have "easily been run against the live site, and got the same results". You'd at least need old dumps. I'm not sure if this could be done with just the old dumps and the live database or not. It'd certainly be a lot *easier* if you had access to the edit #s of all the regularly deleted revisions. Simple question - does the toolserver have access to that?

The second unmarked column is 1 if and only if the namespace of the edit was "Talk". I haven't checked if the first is indeed "minor". I have enough old dumps that I could conceivably do this, but not enough inspiration yet.
anthony
QUOTE(privatemusings @ Fri 26th June 2009, 5:41am) *

I see the infamous FT2 oversights are there though....


It's definitely incomplete, though. It's missing quite a few infamous SlimVirgin oversights.
TungstenCarbide
QUOTE(tarantino @ Fri 26th June 2009, 7:41pm) *
Are you speaking in your official capacity as a WMF staff member?


After clicking your link I stumbled onto this sick.gif and this yecch.gif

"It was created following a community consensus that Wales' Steward rights should be removed due to his inactivity with the right."

In the past I believe Jimbo has stated his 'God-King' status came from the community and tradition. Now it seems to come from the board.
CharlotteWebb
QUOTE(anthony @ Sat 27th June 2009, 8:02pm) *

It's definitely incomplete, though. It's missing quite a few infamous SlimVirgin oversights.

Depending on which you refer to, they may have been from before the revisions were stored in the "hidden" table (rather than deleted completely), or from before the act of doing so was logged.
sbrown
QUOTE(TungstenCarbide @ Sat 27th June 2009, 10:32pm) *

In the past I believe Jimbo has stated his 'God-King' status came from the community and tradition. Now it seems to come from the board.

What Jimbos saying is the hell with the community for which he has no respect. If they wont follow his diktat he rides roughshod over them with the help of a toadying WMF board so he can remain the supreme leader. Link to Iran thread.

anthony
QUOTE(CharlotteWebb @ Sat 27th June 2009, 9:44pm) *

QUOTE(anthony @ Sat 27th June 2009, 8:02pm) *

It's definitely incomplete, though. It's missing quite a few infamous SlimVirgin oversights.

Depending on which you refer to, they may have been from before the revisions were stored in the "hidden" table (rather than deleted completely), or from before the act of doing so was logged.


I thought there was a log from the beginning. In fact, I thought it was initially a *public* log.

Now that I'm looking back, I think the user in question was "Slimv", not "SlimVirgin". The articles in question included [[Pierre Salinger]], [[Pan Am Flight 103]], and [[Mordechai Vanunu]]. Now maybe these were deletions, and not oversight, but I witnessed them in the dumps with my own eyes, so I know they used to be there and now they aren't. If anyone has access to view deleted revisions, check for edits by "Slimv" in November 2004 to those three articles.
tarantino
QUOTE(anthony @ Sat 27th June 2009, 7:31pm) *

It certainly couldn't have "easily been run against the live site, and got the same results". You'd at least need old dumps. I'm not sure if this could be done with just the old dumps and the live database or not. It'd certainly be a lot *easier* if you had access to the edit #s of all the regularly deleted revisions. Simple question - does the toolserver have access to that?



Yes. Here's the fields available in the archive tables on the toolserver.

CODE
ar_namespace
ar_title
ar_text
ar_comment
ar_user
ar_user_text
ar_timestamp
ar_minor_edit
ar_flags
ar_rev_id
ar_text_id

MZMcBride
QUOTE(tarantino @ Sat 27th June 2009, 11:19pm) *

QUOTE(anthony @ Sat 27th June 2009, 7:31pm) *

It certainly couldn't have "easily been run against the live site, and got the same results". You'd at least need old dumps. I'm not sure if this could be done with just the old dumps and the live database or not. It'd certainly be a lot *easier* if you had access to the edit #s of all the regularly deleted revisions. Simple question - does the toolserver have access to that?



Yes. Here's the fields available in the archive tables on the toolserver.

CODE
ar_namespace
ar_title
ar_text
ar_comment
ar_user
ar_user_text
ar_timestamp
ar_minor_edit
ar_flags
ar_rev_id
ar_text_id



Actually, they recently restricted some of the table.

CODE

mysql> DESCRIBE archive;
+---------------+-----------------+------+-----+---------+-------+
| Field         | Type            | Null | Key | Default | Extra |
+---------------+-----------------+------+-----+---------+-------+
| ar_namespace  | int(11)         | NO   |     | 0       |       |
| ar_title      | varchar(255)    | NO   |     |         |       |
| ar_user       | int(5) unsigned | NO   |     | 0       |       |
| ar_user_text  | varchar(255)    | NO   |     |         |       |
| ar_timestamp  | varchar(14)     | NO   |     |         |       |
| ar_minor_edit | tinyint(1)      | NO   |     | 0       |       |
| ar_flags      | tinyblob        | NO   |     | NULL    |       |
+---------------+-----------------+------+-----+---------+-------+
7 rows in set (0.00 sec)


Most notably no more ar_comment (deleted edit summaries).
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.