Ever wondered how to seamlessly search through numerous files, or perhaps navigate the complexity of searching within files across multiple folders? All your queries find answers here. This article delves into the realm of text scanning, showcasing how to search for multiple texts or phrases in various files spread across diverse folders using Java.

Java API for Scanning Text across Files and Folders

Today, we’ll use GroupDocs.Search for Java API to search multiple texts within files of various file formats across folders. This library lets us programmatically scan the text within a large range of file formats for Word documents, spreadsheets, presentations, PDF files, Markup files, eBooks, email messages, One Note documents, and ZIP archives. The documentation provides a detailed list of supported file formats.

You can download the JAR file from the downloads section or use the latest repository and dependency Maven configurations within your Java applications.

Searching Multiple Texts in Files across Folders using Java

Follow the following steps to perform a text search in multiple files within multiple folders using Java and generate highlighted HTML output files for each found document.

  • Create an Index object with the specified index folder path.
  • Index the parent document folder using the add method.
  • Define a search query with multiple terms or phrases.
  • Execute the text scanning using the search method and store the results.
  • Iterate through the search results:
    • Access each found document using the getFoundDocument method.
    • Access or print any file information for the found document.
    • Set up an OutputAdapter for the desired format and path.
    • Create a Highlighter for the document.
    • Highlight and output the search results to an HTML file using the highlight method.

The above code collectively enables you to search for specific texts within multiple files and generate highlighted HTML output files for each found document.

Highlighted Text Search Results in HTML format

Printing the Text Search Results

From the result of the search query, you can further extract information about the found documents.

The following will be the output of printing the above search results obtained from the DOCX, PDF and TXT files:

File Name: Lorem ipsum.docx
Occurrences: 101
	Field: filename
	Occurrences: 1
		lorem ipsum  - 1
	Field: content
	Occurrences: 100
		non - 94
		lorem ipsum  - 6
====================================
File Name: Lorem ipsum.pdf
Occurrences: 60
	Field: filename
	Occurrences: 1
		lorem ipsum  - 1
	Field: content
	Occurrences: 59
		non - 53
		lorem ipsum  - 6
====================================
File Name: English.txt
Occurrences: 39
	Field: content
	Occurrences: 39
		water - 39

Complete code

Here is the complete Java code that collectively searches the text strings and phrases in multiple files and folders:

Getting a Free License or a Free Trial

Free License

To explore this library without constraints, you can obtain a temporary license for free.

Free Trial

Download the free trial from the downloads section.

Java API for Searching within Files and Folders

Conclusion

In this article, we have just explored text scanning to search multiple texts in multiple files across multiple folders using Java. Starting with the search query, we searched within multiple files and folders and highlighted the found results in the respective HTML files.

For detailed API information, readers are encouraged to consult the documentation. Questions and further discussions can be directed to the provided forum.


See Also