Control documents comparison sensitivity in Java

Document comparison is one of the most common procedure that is practiced almost in all of the digital businesses. And the objective is same, highlight the inserted or deleted items. Detect the style changes and generate a summary. Let’s see how GroupDocs.Comparison for Java can help us with this scenario. This is a back-end API that can be integrated in any Java application irrespective of the frameworks. Explore API documentation to learn more about the supported features and file formats.

Those who are already using the API, we’ll discuss new features and improvements introduced in version 19.10.
How about controlling the document comparison sensitivity? We’ve added a sensitivity property in ComparisonSettings class. This option defines limit in percents, when an element is detected as deleted or inserted.

Minimal value
Minimal value is 0, comparison process does not occur for any length of sequences of two compared objects.

Value by default
The default percentage is 75, comparison occurs when the percentage of deleted or inserted elements in relation to all elements does not exceed 75%.

Maximum value
That is 100%. Comparison occurs at any length of a common sub-sequence of two compared objects.

Now let’s understand this with a use-case. Suppose we have two words:

  1. oneSource
  2. twoTarget

These two words have very small common sub-sequence. Therefore, when comparing them at 75% accuracy, it is not taken into account and we get a completely removed and inserted word as follows:

(twoTarget)[oneSource]

But at 100% accuracy, this sub-sequence will be treated or represented in a different way, despite the fact that it consists of two letters.

(tw)o[n](Targ)e[Source](t)

Isn’t it amazing? You can now get briefed comparison results by just controlling the sensitivity.

Did you ever think of getting coordinates of document changes or differences? It could be confusing at first but let me elaborate this. In your output or resultant document, you get every detail of inserted, deleted or style changed items. The new thing is that you can get coordinate details where changes or differences actually occurred. Currently this feature is supported for only Word, PDF, Slide and Diagram formats.

You can get the API from download section. We also have an open-source GitHub example project. However, if you face any issue while evaluating the API, you can post it on forum.

Share on FacebookTweet about this on TwitterShare on LinkedIn