A Python module to annotate two versions of a list with the
values that have been changed between the versions, similar
to unix's diff but with a dead-simple Python interface.
SimpleDiff can be installed from pip or easy_install
$ pip install simplediff
You can use test.py to run the included doctests
$ python test.py
No output means that all the tests have passed.
The module exposes three functions, diff, string_diff,
and html_diff.
diff is the core of the module and provides basic diffing
functionality on list objects. The lists could represent
lines of a file, tokens from a tokenizer, or just about
anything else. diff is agnostic towards the content of the
list, all it cares about is which elements between the
old and new list are equal.
>>> from simplediff import diff
Because strings share an accessor interface with lists in
Python, diff automatically works on strings at a character
level.
>>> diff('shark', 'spork')
... # doctest: +NORMALIZE_WHITESPACE
[('=', 's'),
('-', 'ha'),
('+', 'po'),
('=', 'rk')]
The output in this case is quite simple. ('=', 's') indicates
that the s is in both strings. ('-', 'ha') indicates that
ha appears in the first string but not the second.
('+', 'po') indicates that po appears in the second string
but not the first. Finally, rk appears in both.
This is a bit of an over-simplification, though. Consider this case:
>>> diff('aaaabbb', 'bbbaaaa')
... # doctest: +NORMALIZE_WHITESPACE
[('+', 'bbb'),
('=', 'aaaa'),
('-', 'bbb')]
Note that although 'bbb' is contained in both strings, it is an insertion and then a deletion in the resulting diff. Note also that it was the longer string, 'aaaa', that was chosen as unchanged; SimpleDiff tries to maximize the length of the unchanged data and minimize the number of inserts and deletes.
Since diff's arguments are really lists, you can easily
tokenize a string (in this case, by whitespace) and provide
that as input.
>>> diff('I like icecream'.split(),
... 'I do not like icecream'.split())
... # doctest: +NORMALIZE_WHITESPACE
[('=', ['I']),
('+', ['do', 'not']),
('=', ['like', 'icecream'])]
Note that the result is now lists of strings, not strings.
string_diff and html_diff take strings, tokenize them
on whitespace, and return the diff output or HTML code
respectively. These are meant as an example implementation,
they're probably not what you want in practice.
>>> from simplediff import string_diff, html_diff
>>> string_diff('I like icecream',
... 'I do not like icecream')
... # doctest: +NORMALIZE_WHITESPACE
[('=', ['I']),
('+', ['do', 'not']),
('=', ['like', 'icecream'])]
This is the same output as with diff, but note that this
time we didn't have to call .split() on the input strings.
>>> html_diff('I like vanilla cupcakes',
... 'I do not like cupcakes')
... # doctest: +NORMALIZE_WHITESPACE
'I <ins>do not</ins> like <del>vanilla</del> cupcakes'
See the doctests in simplediff/__init__.py for more
examples.
SimpleDiff is copyright 2008-2012 by Paul Butler. It may
be used and redistributed under a liberal zlib/libpng-like
license provided in the LICENSE file.