Help - Search - Members - Calendar
Full Version: Do it yourself checkuser
> Wikimedia Discussion > General Discussion
Peter Damian
Here is a nasty trick I just used to find out whether two accounts were the same user or not (in fact they weren't). Paste the following query into your browser for user XYZ.

http://en.wikipedia.org/w/index.php?title=...5000&target=XYZ

which gives you 5,000 edits prior to January 1 2008 for user XYZ. Copy this by ctrl-A, then paste special into a spreadsheet (edit, paste special, values – the last one, values, is very important otherwise you get all sorts of formatting nonsense and will probably crash your PC).

If there are more than 5k edits, as is likely, repeat the process into the spreadsheet, taking care to delete irrelevant header and footer rows like the donation to the Wiki foundation nonsense.

Format the text strings. Assuming these are in column A, in column B put the formula =value(left(A1, 5)). This computes the ‘day fraction’ from the timestamp, i.e. midnight is 0, midday is 0.5, 9 o’clock in the morning is 0.25. Copy this formula down.

Finally, sort the two columns by column B, so that you get all the edits in order, regardless of date, from midnight through to 23:59. Then graph it.

What you then get is a characteristic hockey stick graph. Even Wikiholics must sleep, and at that point the graph gets very steep – for those who never edit in the wee hours, it will jump up from the time they go to bed, then level off at the time they get up. There may be other humps corresponding perhaps to the time they go to work, but I haven’t seen this so far. Wikiholics just edit all through the day, pausing only for a few hours sleep.

But the sleep pattern tells you two things. First, what time zone they are in. All Wiki time stamps are UTC (i.e. Greenwich mean time) thus compute the offset between bedtime and getting up, making suitable assumptions for bedtime and getting up, and you know where they are working from. Extremely useful for detecting whether users are different (will not tell you whether two users in same time zone are different, however).

Second, even if two users are in the same zone, there may be other differences in the patterns. Serious wikiholics do tend to edit in the small hours, thus the steep part of the slope will be less steep. For non-holics, by contrast, the steep bit is a cliff. There may be other bits corresponding to going to work, eating &c. For example, I did my own one and noticed most of my edits in early morning before go to work, and early evening after supper, none at all between 10 at night and 8 o’clock in the morning. And quite right too.

Thus, pretty easy to distinguish different users, without complex checkuser software. Very evil, moderators please move or delete if you feel this may get into the wrong hands.
wikiwhistle
QUOTE(Peter Damian @ Mon 7th January 2008, 9:04am) *

Here is a nasty trick I just used to find out whether two accounts were the same user or not (in fact they weren't).


Ooooh clever. Would it be very reliable if one or both of the accounts had far fewer edits than the ones you compared?
Poetlister
There's one obvious gaping flaw. Suppose you only use one account when you are at home and another when you are out - say at work or in a library or Internet cafe. The diurnal pattens would be radically different.

Moulton
QUOTE(Poetlister @ Mon 7th January 2008, 8:38am) *
There's one obvious gaping flaw. Suppose you only use one account when you are at home and another when you are out - say at work or in a library or Internet cafe. The diurnal pattens would be radically different.

In fact they would be so radically different that they would have zero overlap in time. It's like Clark Kent is never seen when Superman is around. The absence of any overlap in time would be yet another clue that the two characters are played by the same actor.
jorge
Wasn't there a tool that displayed the time editing patterns of users automagically?
AB
And what of those who work the night shift, or have irregular
work schedules? And how would you tell and absence for
sleep from an absence for work at a place where one does
have access to the internet? What of self-employed people?
What of those who rise early in the morning and go to bed
early, vs. those who sleep in late and stay up late?
Peter Damian
QUOTE
[wikiwhistle] Ooooh clever. Would it be very reliable if one or both of the accounts had far fewer edits than the ones you compared?


Probably need about 1,000 edits to be statistically reliable. But most serious accounts are considerable multiples of that. Which means, if you are testing for unfair sockpuppet blocking, where the SP tends to have a short life, it's hard to determine whether it was fair or not. For example, I was trying to determine whether the alleged sockpuppets of HeadleyDown were genuinely sockpuppets. Almost impossible to tell.

Another technique (which I borrowed from the stylistic analysis used to tell whether the same author was responsible for different medieval manuscripts) is to look at repeated phrases, particularly in comment fields.

QUOTE
[Poetlister] There's one obvious gaping flaw. Suppose you only use one account when you are at home and another when you are out - say at work or in a library or Internet cafe. The diurnal pattens would be radically different.


QUOTE
[Moulton] In fact they would be so radically different that they would have zero overlap in time. It's like Clark Kent is never seen when Superman is around. The absence of any overlap in time would be yet another clue that the two characters are played by the same actor.


Agree with Moulton here. It's not something you could write a program to analyse, not easily anyway. It takes human judgment, but, as Mouton says, it would be bleeding obvious. The strongest test is the 'sleeping' one because most humans, even the saddest wikiholics, need a fixed amount of sleep at the same time. Stuff like usage at work would show up in the Clark Kent/Superman way, as Moulton suggests. But in any case, I don't see an obvious reason for using one account in one place, another in another. The usual reason for different accounts is either 1. to have apparently different editors arguing the same POV, in which case both accounts are needed at the same time or 2. to separate accounts by topics or persona, in which case the same argument applies.
Moulton
The main reason would be to have different accounts associated with different IPs.
Peter Damian
QUOTE(AB @ Mon 7th January 2008, 2:19pm) *

And what of those who work the night shift, or have irregular
work schedules? And how would you tell and absence for
sleep from an absence for work at a place where one does
have access to the internet? What of self-employed people?
What of those who rise early in the morning and go to bed
early, vs. those who sleep in late and stay up late?



Agree this is not going to be infallible. All I can say is it definitely worked in answering one particular question I had. Absence due to work will result in two 'humps', because the sleep hump is usually very identifiable. On the rising early thing, agreed, it will only work +/- two hours. But that is sufficient to differentiate time zones. And in any case, statistically there are two things we distinguish: whether two things are definitely the same, or whether two things are definitely different. All this needs to be factored in.
AB
Additionally, it is quite common for different people who live
in the same time zone to have the same/similar work/sleep
schedules.

A skilled sockpuppeteer could easily adjust for this by simply
starting one sockpuppet a few hours earlier than the other,
and putting away that same sockpuppet a few hours earlier.
jorge
Peter to be honest, these things you are going over are pretty common things known to us "sock spotters" but if you are relatively new to wikipediatrics then that would explain why you are just discovering the various techniques.
Peter Damian
QUOTE(jorge @ Mon 7th January 2008, 2:31pm) *

Peter to be honest, these things you are going over are pretty common things known to us "sock spotters" but if you are relatively new to wikipediatrics then that would explain why you are just discovering the various techniques.


Oh yes indeed I'm sure. But this is pretty easy to do using common household implements such as Excel. I am, as you say, new to this, er, hobby. Most of the time when I was actually at the encyclopedia I concentrated on writing good quality articles on academic/useful subjects, from scratch!

QUOTE

Additionally, it is quite common for different people who live
in the same time zone to have the same/similar work/sleep
schedules.


Not that I've seen so far. They all seem to be different in a finger-printy sort of way.

AB
QUOTE(Peter Damian @ Mon 7th January 2008, 2:41pm) *
QUOTE(AB @ Mon 7th January 2008, 2:31pm) *
Additionally, it is quite common for different people who live
in the same time zone to have the same/similar work/sleep
schedules.


Not that I've seen so far. They all seem to be different in a finger-printy sort of way.


And how would you be able to tell? If different people living in the
same time zone had the same/similar work/sleep schedules, or if
they were secretly one person?
Peter Damian
QUOTE(AB @ Mon 7th January 2008, 2:47pm) *

And how would you be able to tell? If different people living in the
same time zone had the same/similar work/sleep schedules, or if
they were secretly one person?


If that were the case, you wouldn't be able to tell. Similarly, if two different people had the same fingerprint pattern, you wouldn't be able to tell either. But as far as we know, different people have different fingerprints, and as far as I can tell from limited use of this, different people tend to have very different editing patterns. It all comes down to probability. If two accounts show very similar editing patterns (rate per hour per time of day &c) then they are probably the same person. If they show different patterns they are probably different.

This actually is a very sad subject. Why are we talking about this?
AB
QUOTE(Peter Damian @ Mon 7th January 2008, 3:02pm) *
If that were the case, you wouldn't be able to tell. Similarly, if two different people had the same fingerprint pattern, you wouldn't be able to tell either. But as far as we know, different people have different fingerprints, and as far as I can tell from limited use of this, different people tend to have very different editing patterns. It all comes down to probability. If two accounts show very similar editing patterns (rate per hour per time of day &c) then they are probably the same person. If they show different patterns they are probably different.


Where 'probably' means 'greater than 50% chance'.

However, it might be instructive to look around, not on the
internet, but in real life around you. Consider a group of
people who have the same job, and the employer has a
rather strict work schedule, just one shift. Starts early in
the morning, so they are all early risers. And, because they
tire out eventually, they all go to bed early. Similar typing
speeds, because typing is required for the job. The company
has one lunch break.

Can you see, how, just with this one group of people, you
have very similar schedules? Of course, standard work hours
are likely to be more or less the same throughout the time
zone, though of course there will be plenty of people working
non-standard work hours.

QUOTE(Peter Damian @ Mon 7th January 2008, 3:02pm) *
This actually is a very sad subject. Why are we talking about this?


Because wanted to share your technique with us, I believe.
thekohser
QUOTE(Peter Damian @ Mon 7th January 2008, 4:04am) *

...Format the text strings. Assuming these are in column A, in column B put the formula =value(left(A1, 5)). This computes the ‘day fraction’ from the timestamp, i.e. midnight is 0, midday is 0.5, 9 o’clock in the morning is 0.25. Copy this formula down.

Finally, sort the two columns by column B, so that you get all the edits in order, regardless of date, from midnight through to 23:59. Then graph it.

What you then get is a characteristic hockey stick graph. Even Wikiholics must sleep, and at that point the graph gets very steep – for those who never edit in the wee hours, it will jump up from the time they go to bed, then level off at the time they get up. There may be other humps corresponding perhaps to the time they go to work, but I haven’t seen this so far. Wikiholics just edit all through the day, pausing only for a few hours sleep.


If the WR community could please check if I did this correctly, the chart for JzG is found here.

It would appear that he gets about 6.72 hours of sleep each night, between UTC fractional times of 0.04 and 0.32. Outside of that "Sleepy time", the man is editing (35-degree angle) like a machine, without any discernable variation for employment, lunch, dinner, or time with the family.

Phew -- it is a chart like this one that should be presented to academic societies related to Sociology or Psychology. How could anyone say that Wikipedia is not an addictive product? The cult-like aspects could also be explored. We saw the University of Minnesota explore a quantitative study of "damaged edits and views". Why haven't we yet seen an academic study of "addictive edits and views"?

I'm equally guilty as charged in terms of the time distribution, but not the volume of edits. I think I racked up about 1,400 edits ever across all accounts on Wikipedia in about 3 years' time, and I'm approaching that same number here at WR in about half the time. Guy Chapman cranked out 5,000 edits in just a few months' time.

Greg
Peter Damian
QUOTE(AB @ Mon 7th January 2008, 3:10pm) *

Where 'probably' means 'greater than 50% chance'.


Not really. Greater than 90% = probable, less than 10%= not.

QUOTE
Similar typing speeds, because typing is required for the job. The company
has one lunch break.


Typing speed definitely not the same as editing speed. Editing speed influenced by how much time one has at a particular moment, how much time needed to consider response. Also how much offline editing is done. I tend to write a whole article offline, then dump in one go.

QUOTE

Can you see, how, just with this one group of people, you
have very similar schedules? Of course, standard work hours
are likely to be more or less the same throughout the time
zone, though of course there will be plenty of people working
non-standard work hours.


Taking the times themselves, agreed. But it is editing rate we are talking about. Some people edit during their own sleeping hours. Some edit through breakfast. Some edit through the 10 o'clock news. These are all lifestyle choices that differ from person to person.

dogbiscuit
QUOTE(AB @ Mon 7th January 2008, 2:47pm) *

And how would you be able to tell? If different people living in the
same time zone had the same/similar work/sleep schedules, or if
they were secretly one person?


If we consider the likes of Slim, where there is some evidence that she sockpuppets, and that she uses some form of IP obfustication, and she sometimes edits for very long stretches, then you can see that working it out is tricky. A fine example of one rule for the trusties, where because they are good, they are allowed to hide on the grounds of privacy, whereas suspected socks are evil and hiding for presumed nefarious reasons.

There are so many ways of gaming the system that you can understand the paranoia of the investigators - why they have to believe that their sleuthing methods are more scientific than they are. After all, all you need is a friend and a different IP address to get a few innocent edits to overlap every now and again - or as we see from Guy, the technology exists to put edits through a bot - so it is trivial to load up a bot to put edits through from a sock at the same time as editing a main account, or vica versa. Bot edits allow all sorts of games, so I am surprised it is allowed at all (except for Trusted Users of course - I forgot).

thekohser
QUOTE(Peter Damian @ Mon 7th January 2008, 9:41am) *

Most of the time when I was actually at the encyclopedia I concentrated on writing good quality articles on academic/useful subjects, from scratch!

Peter, why would you use such a complex phrase as "from scratch", when you could have more simply said ab initio?

laugh.gif
Peter Damian
QUOTE(thekohser @ Mon 7th January 2008, 3:22pm) *

Peter, why would you use such a complex phrase as "from scratch", when you could have more simply said ab initio?


Yes you right. But I could not remember where Aquinas says this.


QUOTE(thekohser @ Mon 7th January 2008, 3:16pm) *


If the WR community could please check if I did this correctly, the chart for JzG is found here.

It would appear that he gets about 6.72 hours of sleep each night, between UTC fractional times of 0.04 and 0.32. Outside of that "Sleepy time", the man is editing (35-degree angle) like a machine, without any discernable variation for employment, lunch, dinner, or time with the family.

Phew -- it is a chart like this one that should be presented to academic societies related to Sociology or Psychology. How could anyone say that Wikipedia is not an addictive product? The cult-like aspects could also be explored. We saw the University of Minnesota explore a quantitative study of "damaged edits and views". Why haven't we yet seen an academic study of "addictive edits and views"?

I'm equally guilty as charged in terms of the time distribution, but not the volume of edits. I think I racked up about 1,400 edits ever across all accounts on Wikipedia in about 3 years' time, and I'm approaching that same number here at WR in about half the time. Guy Chapman cranked out 5,000 edits in just a few months' time.

Greg


I couldn't get your spreadsheet to open, but it sounds right. Guy clearly lives in London (but we know that). What we are missing is the edit rates per hour - you need to select two fractional times corresponding to an hour's difference, then find the number of edits.
east.718
You guys realize that Wikipedia's edit counter does the same thing, right?
Peter Damian
QUOTE(east.718 @ Mon 7th January 2008, 3:36pm) *

You guys realize that Wikipedia's edit counter does the same thing, right?


Oviously we didn't. Thank you

[Edit] just tried and it turns out there is something called 'Opt in' - it is private? Also, while it breaks down the different kinds of contribution, does it show the total editing activity? Couldn't find that.
thekohser
In the interest of fair play, the edit curve for Thekohser (673 edits) is found here.

I at least seem to have had editing spurts during lunch hour, after work, and before bed. Not exactly a linear angle -- but still frighteningly dispersed enough to cause concern for myself.

EDIT: It would be nice if we could easily take out weekends and holidays from the analysis, which I'm sure could be done with some kind of calendar-based filter. Weekend editing during otherwise "normal business hours" would have the effect of making you look like you edit all the waking day.

Greg
Peter Damian
QUOTE(thekohser @ Mon 7th January 2008, 3:42pm) *

In the interest of fair play, the edit curve for Thekohser (673 edits) is found here.

I at least seem to have had editing spurts during lunch hour, after work, and before bed. Not exactly a linear angle -- but still frighteningly dispersed enough to cause concern for myself.

Greg


About 40 edits an hour, right? Compared to Guy's 120.
Yehudi
QUOTE(Moulton @ Mon 7th January 2008, 1:56pm) *

QUOTE(Poetlister @ Mon 7th January 2008, 8:38am) *
There's one obvious gaping flaw. Suppose you only use one account when you are at home and another when you are out - say at work or in a library or Internet cafe. The diurnal pattens would be radically different.

In fact they would be so radically different that they would have zero overlap in time. It's like Clark Kent is never seen when Superman is around. The absence of any overlap in time would be yet another clue that the two characters are played by the same actor.

How would you distinguish that from two different people, one of whom only edits from home and another who, not having broadband at home, only edits from elsewhere?

QUOTE(Peter Damian @ Mon 7th January 2008, 2:19pm) *

Agree with Moulton here. It's not something you could write a program to analyse, not easily anyway. It takes human judgment, but, as Mouton says, it would be bleeding obvious.

Aha! A budding Durova.

QUOTE(Peter Damian @ Mon 7th January 2008, 2:23pm) *

Absence due to work will result in two 'humps', because the sleep hump is usually very identifiable.

Only if people get up very early and edit before going to work. What of the person who rushes off in the morning but has plenty of time in the evening? And don't forget to allow for day of the week. Maybe they edit all day at weekends, or at least at the time they'd be editing if at work.



QUOTE(Peter Damian @ Mon 7th January 2008, 3:19pm) *

QUOTE(AB @ Mon 7th January 2008, 3:10pm) *

Where 'probably' means 'greater than 50% chance'.


Not really. Greater than 90% = probable, less than 10%= not.

Who was it said "When I use a word, it means whatever I choose it to mean"?
thekohser
QUOTE(Peter Damian @ Mon 7th January 2008, 10:46am) *

QUOTE(thekohser @ Mon 7th January 2008, 3:42pm) *

In the interest of fair play, the edit curve for Thekohser (673 edits) is found here.

I at least seem to have had editing spurts during lunch hour, after work, and before bed. Not exactly a linear angle -- but still frighteningly dispersed enough to cause concern for myself.

Greg


About 40 edits an hour, right? Compared to Guy's 120.

According to the Wikipedia User Edit Counter, Thekohser had 0.66 edits per DAY. And JzG has 58.83 per day.

I wonder if this Wikipedia tool is dividing the number of days betwen "first edit" and "current date". If so, that would have serious consequences on looking at averages for banned users, or users who went on extended wiki-breaks.

Anyway, I'm thinking that the provocative thing that's going to come out of all of this is not the use of such data to identify sockpuppets, but rather to identify Wikipedia (and Wikipedia Review) addicts. We all need some professional help!

Greg
Moulton
QUOTE(Yehudi @ Mon 7th January 2008, 10:53am) *
How would you distinguish that from two different people, one of whom only edits from home and another who, not having broadband at home, only edits from elsewhere?

Two different people, editing independently of each other for months and months, would almost surely have occasions where both are editing at the same time. One person, editing sometimes from home and at other times from another locale, would never have any periods of overlap.
Peter Damian
QUOTE(thekohser @ Mon 7th January 2008, 3:58pm) *

According to the Wikipedia User Edit Counter, Thekohser had 0.66 edits per DAY. And JzG has 58.83 per day.

Greg


I mistakenly forgot to divide by the number of days. And I still can't get the wretched 'User Edit Counter' to work.


QUOTE(thekohser @ Mon 7th January 2008, 3:58pm) *


I wonder if this Wikipedia tool is dividing the number of days betwen "first edit" and "current date". If so, that would have serious consequences on looking at averages for banned users, or users who went on extended wiki-breaks.

Greg


And you're quite right, one needs to determine the periods when a user is active. Or do we? The fact someone takes a Wikibreak suggests they are not an addict, ergo, one should include all days.
Yehudi
QUOTE(Moulton @ Mon 7th January 2008, 4:02pm) *

Two different people, editing independently of each other for months and months, would almost surely have occasions where both are editing at the same time. One person, editing sometimes from home and at other times from another locale, would never have any periods of overlap.

Two different people, editing independently of each other for months and months, one person editing from home and the other from another locale, would probably never have any periods of overlap.
Nathan
This is a much better way of checking without the standard "Oh you sound like User A so I can magically deduce you're the same user" BS, which is oftentimes wrong.
Moulton
QUOTE(Yehudi @ Mon 7th January 2008, 11:13am) *
Two different people, editing independently of each other for months and months, one person editing from home and the other from another locale, would probably never have any periods of overlap.

Do you honestly believe that?
LamontStormstar
This idea assumes each account is editing a lot, but if they're editing sparsely, have irregular sleep schedules, or even one edits for say 24 hour straight due to obsession, then things are messed up.
This is a "lo-fi" version of our main content. To view the full version with more information, formatting and images, please click here.