Find and Replace Words in Documents using Java

In one of the articles, we have already discussed how to redact words in documents as a .NET developer. The strategy is used in many ways to erase sensitive content, hide or remove private information like email addresses or identification numbers. This article discusses how to redact Word, PDF, or other documents programmatically using Java. We will separately discuss how to find and replace the text, words, or phrases with different techniques using Java API for redaction.

The following topics are going to be covered below:

Java API for Redaction and Replacing Text

GroupDocs.Redaction provides a Java redaction API that allows finding and replacing the data from documents of various other documents formats. In addition to the text redaction and rasterization, the API supports metadata, annotation, spreadsheet, and also the images redaction features. The supported file formats of the Word documents, spreadsheets, presentations, images, and PDF documents are available at the documentation.

Download or Configure

You may download the JAR file from the downloads section, or just get the latest repository and dependency configurations for the pox.xml of your maven-based Java applications.

<repository>
	<id>GroupDocsJavaAPI</id>
	<name>GroupDocs Java API</name>
	<url>https://repository.groupdocs.com/repo/</url>
</repository>
<dependency>
        <groupId>com.groupdocs</groupId>
        <artifactId>groupdocs-redaction</artifactId>
        <version>21.6</version> 
</dependency>

MS Office, PDF editor, or any other third-party software is not required in this process. Let’s now start with different approaches to deal with search and replace text within the documents. The following is the screenshot of a Word document that is used in the below examples. You can use the same methods for other document formats as well without any change in the code.

Document to redact text

Find and Replace a Word or Phrase using Java

The following steps explain how to find and then replace the occurrences of a word/phrase in a Word, PDF, or any other document within the Java application.

  • Load the document using Redactor class.
  • Find the exact phrase or word, using the ExactPhraseRedaction and ReplacementOptions classes.
  • Use apply method of Redactor to apply redaction.
  • To save the file at different location after making changes, use the output stream.
  • Save the redaction changes using the save method.

The following code finds and replaces the word “John Doe” in the above Word document using Java. It replaces all the occurrences of “John Doe” with the word “[censored]”.

The output of the code is as follows.

Redact using Exact Phrase

Find and Replace Case-Sensitive Word or Phrase using Java

You seem cautious about the exact letter case of the word and only want to replace the word that only matches your case-sensitive search. The following code replaces the existence of the exact case-sensitive match of the word “John Doe” using Java.

The output of the code is as follows.

Case sensitive redaction

Replace Text using Regular Expressions (RegEx) in Java

If you do not want to change the exact word but some pattern that exists in your document, you can use the Regular expressions. The following steps allow you to find and replace any pattern of text using regular expressions (RegEx) with your Java applications.

  • Load the document using Redactor class.
  • Create the RegEx using the RegexRedaction.
  • Provide the text using ReplacementOptions to replace the RegEx match.
  • Use apply method replace all the regex matches.
  • Use the save method to get the redacted document.

The following code shows how to find the exact pattern using RegEx and replace it with some other text using Java.

The following is the output of the above code:

RegEx Redaction

Replace the Text with Colored Box in Java

If you do not want to replace your content and just want to hide it, the API allows you to cover to text match by drawing a box over it. The following code hides the text with the black rectangle box using Java.

The output of the above code is as follows.

Hide Text using Box

Get a Free API License

You can get a free temporary license in order to use the API without the evaluation limitations.

Conclusion

To sum up, you learned how to find text using exact text phrase search, case-sensitive search, search using regular expressions, and last but not least hiding the text instead of replacing it. You can use these different techniques to replace the findings in different ways within MS Word, PDF, and other documents. For more details and learning about the API, visit the documentation. For queries, contact us via the forum.

See Also