QUOTE(Kelly Martin @ Tue 29th September 2009, 10:23pm)
They won't enable email notifications for the English Wikipedia (the only project for which they are not enable) because turning them on would cream the daylights out of the poor little box that handles their email.
They can't use more than one box for email?
QUOTE(dogbiscuit @ Wed 30th September 2009, 10:48am)
The trouble with diff based revisions is that they are at risk from both storage and processing error.
Not any more than the current system. Did you look at HistoryBlob? Have you seen many reports of storage and processing errors in svn?
QUOTE(dogbiscuit @ Wed 30th September 2009, 10:48am)
Also, when you want to do things like removing versions, it is going to be a process of reconstruction then rediffing.
Removing individual versions is relatively rare, and if you use skip-deltas (like Subversion), the process is no more complicated than reading a history version. Finding the record would be O(log n) and once you found the record it'd be O(1). In all likelihood it'd be *faster* than the current system.
QUOTE(Milton Roe @ Thu 1st October 2009, 1:05am)
QUOTE(dogbiscuit @ Wed 30th September 2009, 5:36pm)
QUOTE(Milton Roe @ Thu 1st October 2009, 1:05am)
Sure, but it's hard to argue for keeping a full image for a change that didn't last longer than a few minutes, six months ago. There is where all you need is the diff, particularly if it was totally reverted, and never came back.
You don't
need anything.
However, the logic they are following is that when they store history in a compressed blob, typically these very similar versions get compressed down to not a lot.
Not unless the differences are subtracted (abstracted) BEFORE they are compressed. And that's the whole question.
No, the compression works well even if you don't do the diff. But it works a lot better if you do the diff.
Concatenating all historical versions of the anarchism article gets you a 902 meg file. Compressing the concatenation of all versions of anarchism gets you from 902 megs to 51 megs. Using reverse deltas gets you from 902 megs to 42 megs. Compressing the reverse delta file gets you from 902 megs to 42 megs to 3.3 megs. That alone doesn't give you O(log n) random access, though. To get random access you need to use skip-deltas, which is how
Subversion works (I believe they use forward deltas rather than reverse deltas though). This is a problem which was solved years ago, but the Wikipedians insist on their own kludgy home grown solution. This is what happens when you have a great coder and terrible manager as CTO.
http://blog.p2pedia.org/2008/10/anarchism.html