Regex Search using Java

Explore the magic of using regular expressions for searching in your Java code! In the coding world, think of regular expressions like secret codes that help us find specific information in text. This article will teach you the basics of searching with regular expressions in Java, so you can smartly search through documents in different file formats across folders.

Java API for Regex Search in Documents by Regex

For doing Regex Search, we’ll use the GroupDocs.Search for Java API. With this API, we can search for specific text in files of different formats across folders, just by giving it the regex pattern. This tool lets us search for text in various types of files, like Word documents, spreadsheets, presentations, PDF files, Markup files, eBooks, email messages, One Note documents, and ZIP archives, all in a programmatic way.

To get a detailed list of file formats supported, check out the detailed documentation.

You have two options for getting the JAR file: download it from the downloads section, or incorporate the latest repository and dependency Maven configurations directly into your Java applications.

Searching in Files with Regex using Java

Here are the simple steps how to conduct a regex search in multiple files of different formats within folders using Java. Afterwards, you can effortlessly create highlighted HTML output files for each located document.

  • Start by making an Index by specifying the folder path.
  • Add the path of the main folder where you want to search in the index you just created.
  • Set up the regex search query.
  • Use the search method to run the search and obtain the results.
  • Now, you can go through SearchResults and generate the desired output as per your preferences.

The regex in the code below detects words with consecutive repeated characters, such as agree, call, and soon. This Java code conducts a fast search using regex in various files of different formats within different folders.

Below, I’ve highlighted the results of the regex search in the provided code:

Highlighted Results of Regex Search in HTML format

Printing Search Results

The following Java code offers two methods to display your search results:

  1. Highlight all the discovered words.
  2. Print the results in a format that is easy to read and analyze.
Document: English.txt
Occurrences: 83
	Field: content
	Occurrences: 82
		acceptance          1
		added               1
		agreeable           1
		agreed              1
		all                 4
		appearance          1
		assurance           1
...
===========================================
Document: Lorem ipsum.docx
	Occurrences: 945
...
Field: content
	Occurrences: 939
		accumsan            39
		class               7
		commodo             40
		convallis           38
		dignissim           35
		efficitur           46
		fringilla           40
		habitasse           2
		laoreet             27
		massa               63
		mattis              31
...

Getting a Free License or a Free Trial

Free License

Obtain a temporary license for free to explore this library without constraints.

Free Trial

You can download the free trial from the downloads section.

Java API for Searching within Files and Folders

Conclusion

In this article, we explored the basics of Regex search wonders to locate words with specific patterns in diverse range of text-based documents such as DOCX, PDF, and TXT files across multiple folders using Java. Subsequently, we showcased the search outcomes by highlighting the identified words and printing them in a clear format.

For a thorough understanding of the API, readers are encouraged to explore the documentation and API Reference.

Any questions or further discussions can be addressed in the forum.


See Also