Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 2

This zip file contains ...

1. Source code for the freeware component - TDiff - written in


Borland's Delphi programming language. This component dramatically
simplify programming tasks that require calculations of 'shortest path' or
'longest common sequence' as typically required in file compare utilities.

2. Two very simple demo applications (with full source code) that
demonstrate how the 'TDiff' component can be used in Delphi programs.

3. A fully featured text compare utility 'TextDiff' (without source code)


which showcases the speed and accuracy of the TDiff component. The
utility (and accompanying help file) requires no installation apart from
extracting from this zip file.

4. PDF documents fully describing the principle algorithms used in both the
new and the older TDiff component ...
The original TDiff component was based on:
"An O(ND) Difference Algorithm and its Variations"
by E Myers - Algorithmica Vol. 1 No. 2, 1986, pp. 251-266
The current TDiff component is based on:
"An O(NP) Sequence Comparison Algorithm"
by Sun Wu, Udi Manber & Gene Myers

------------------------------------------------------------------------------

## TDiff Component ##
Version : 4.1
Last updated : 28 May 2008
Compilers : Delphi 7
Author : Angus Johnson - angusj-AT-myrealbox-DOT-com
Copyright : � 2001-2008 Angus Johnson

History:
13 December 2001 - Original release (used Myer's O(ND) Difference Algorithm)
22 April 2008 - Complete rewrite to greatly improve the code and
provide a much simpler view of differences through a new
'Compares' property.
21 May 2008 - Another complete code rewrite to use Sun Wu et al.'s
O(NP) Sequence Comparison Algorithm which more than
halves times of typical comparisons.
24 May 2008 - Reimplemented "divide-and-conquer" technique (which was
omitted in 21 May release) so memory use is again minimal.
25 May 2008 - Removed recursion to avoid the possibility of running out
of stack memory during massive comparisons.
2 June 2008 - Minor bugfixes.
------------------------------------------------------------------------------

Why did I changed difference algorithms?


The short answer is that the new one is faster.

The original "D" algorithm has been replaced by the "P" algorithm
(see accompanying PDF documents for their full descriptions). Both
algorithms accurately derive a "shortest edit script" or "edit path" (and
conversely a "longest common subsquence"), but the "P" algorithm is
significantly faster. However, on their own, both algorithms require
exponential amounts of memory to map out an optimal "edit path". To
circumvent this problem of consuming very large amounts of memory, a
'divide-and-conquer' technique can be used. By applying the difference
algorithms simultaneously from both ends of the difference arrays, a
midpoint of an optimal "edit path" can be derived. Then by repeatedly finding
"edit path" midpoints of subsections of a whole "edit path", an entire
"edit path" can be defined. However, only the original "D" algorithm is
optimally suited to this 'divide-and-conquer' approach. Applying the new "P"
algorithm simultaneously from both ends of difference arrays does not alway
reveal a midpoint that's exactly on an optimal "edit path". This means that
very occasionally matches within surrounding differences may get missed.
Fortunately in practice these missed 'matches' are of little or no
significance as they're always embedded within larger blocks of changes.
As an example, consider the following 2 phrases - "a child swims in the creek"
and "a frog hopped in the creek". The two phrases share several matching
words - "a ... in the creek" - and also a matching space between the two
changed words. In a more complex example the faster "P" algorithm using
the 'divide-and-conquer' memory sparing technique might miss that matching
space and report that the dissimilarities encompass both changed words and
that space. Most importantly however, while matches may very occasionally be
missed, there's no risk of differences being missed when using the "P"
algorithm with 'divide-and-conquer'. Anyhow, I believe the speed benefits of
the new "P" algorithm generally outweigh an occasional missed match.
Nevertheless, in case these occasional missed matches are considered
important by some, I've included the older TDiff implementation
(using the "D" algorithm) in this package so it can still be used
instead of the newer implementation.

------------------------------------------------------------------------------

## TextDiff utility ##
Version : 4.5
Last updated : 2 June 2008
Copyright : � 2001-2008 Angus Johnson
License : Freeware
Installation : Just extract the TextDiff.exe and TextDiff.hlp files and run.

While now a fairly comprehensive file & folder compare/merge utility, it was
originally written to showcase the speed and accuracy of the TDiff component.

------------------------------------------------------------------------------

Licence to use, terms and conditions:


The source code in this package is released as freeware
provided you agree to the following terms & conditions:
1. copyright notices, terms and conditions are left unchanged.
2. modifications to the code by other authors must be
clearly documented and accompanied by the modifier's name.
3. the code may be freely compiled into binary format
and while acknowledgement is appreciated, it isn't required.

------------------------------------------------------------------------------

You might also like