Explore the magic of using regular expressions for searching in your Java code! In the coding world, think of regular expressions like secret codes that help us find specific information in text. This article will teach you the basics of searching with regular expressions in Java, so you can smartly search through documents in different file formats across folders.
Java API for Regex Search in Documents by Regex
For doing Regex Search, we’ll use the GroupDocs.Search for Java API. With this API, we can search for specific text in files of different formats across folders, just by giving it the regex pattern. This tool lets us search for text in various types of files, like Word documents, spreadsheets, presentations, PDF files, Markup files, eBooks, email messages, One Note documents, and ZIP archives, all in a programmatic way.
To get a detailed list of file formats supported, check out the detailed documentation.
You have two options for getting the JAR file: download it from the downloads section, or incorporate the latest repository and dependency Maven configurations directly into your Java applications.
Searching in Files with Regex using Java
Here are the simple steps how to conduct a regex search in multiple files of different formats within folders using Java. Afterwards, you can effortlessly create highlighted HTML output files for each located document.
- Start by making an Index by specifying the folder path.
- Add the path of the main folder where you want to search in the index you just created.
- Set up the regex search query.
- Use the search method to run the search and obtain the results.
- Now, you can go through SearchResults and generate the desired output as per your preferences.
The regex in the code below detects words with consecutive repeated characters, such as agree, call, and soon. This Java code conducts a fast search using regex in various files of different formats within different folders.
Below, I’ve highlighted the results of the regex search in the provided code:
Printing Search Results
The following Java code offers two methods to display your search results:
- Highlight all the discovered words.
- Print the results in a format that is easy to read and analyze.
Document: English.txt
Occurrences: 83
Field: content
Occurrences: 82
acceptance 1
added 1
agreeable 1
agreed 1
all 4
appearance 1
assurance 1
...
===========================================
Document: Lorem ipsum.docx
Occurrences: 945
...
Field: content
Occurrences: 939
accumsan 39
class 7
commodo 40
convallis 38
dignissim 35
efficitur 46
fringilla 40
habitasse 2
laoreet 27
massa 63
mattis 31
...
Getting a Free License or a Free Trial
Free License
Obtain a temporary license for free to explore this library without constraints.
Free Trial
You can download the free trial from the downloads section.
Conclusion
In this article, we explored the basics of Regex search wonders to locate words with specific patterns in diverse range of text-based documents such as DOCX, PDF, and TXT files across multiple folders using Java. Subsequently, we showcased the search outcomes by highlighting the identified words and printing them in a clear format.
For a thorough understanding of the API, readers are encouraged to explore the documentation and API Reference.
Any questions or further discussions can be addressed in the forum.