Fast Diffs of Wikitext

Fabian Flöck writes on Wiki-research-l: If anyone is interested in a faster processing of revision differences, you could also adapt the strategy we implemented for wikiwho , which is keeping track of bigger unchanged text chunks with hashes and just diffing the remaining text (usually a relatively small part oft the article).

We specifically introduced that technique because diffing all the text was too expensive. And in principle, it can produce the same output, although we currently use it for authorship detection, which is a slightly different task. Anyway, it is on average >100 times faster than pure "traditional" diffing. Maybe that is useful for someone. Code is available at github .