Search Homophones in Files using GroupDocs

Synonyms are words with similar meaning, and Homophones sounds the same but are different in meanings or spellings. We learned to find synonyms in multiple documents using Java. Today, in this article, we’ll see how to search homophones within multiple documents using Java.

The following topics will be covered below:

Java API for Searching Homophones

GroupDocs.Search showcases the Java API GroupDocs.Search for Java that allows finding any word and its homophones within multiple files of any specific folder. It can search the content of various different formats. In addition to finding the homophones, the API supports many other searching techniques which include:

  • Case-Sensitive Search
  • Fuzzy Search
  • Phrase Search
  • Regular Expressions Search
  • Synonym Search
  • Wild Card Search

You can download the JAR file from the downloads section or use the latest repository and dependency Maven configurations within your Java applications.

<repository>
	<id>GroupDocsJavaAPI</id>
	<name>GroupDocs Java API</name>
	<url>http://repository.groupdocs.com/repo/</url>
</repository>
<dependency>
        <groupId>com.groupdocs</groupId>
        <artifactId>groupdocs-search</artifactId>
        <version>21.8</version> 
</dependency>
PM> Install-Package GroupDocs.Search

Find Homophones in Multiple Files in Java

The following steps guide how to search homophones in multiple files of a folder in Java.

  • Define the search word query, indexing folder, and the container folder of your files.
  • Create Index with the defined index folder.
  • Add the document’s folder to the index.
  • Define the SearchOptions and enable the homophoneSearch using setUseHomophoneSearch method.
  • Perform the homophones search using search method.
  • Use the properties of the retrived SearchResult as needed.

The following Java source code finds all the homophones within files of the defined folder. Further, you can also manage your homophone dictionary.

The output of the above code is as follows:

Query: right
Documents: 2
Occurrences: 17

You can use the homophones search results by following the steps after getting the homophones and their occurrences from each document.

  • Traverse the search results.
  • Get each FoundDocument using the getFoundDocument method.
  • Use the properties of each FoundDocument as required.
  • Now, traverse the fields of FoundDocument by getting FoundDocumentField.
  • Later, from each FoundDocumentField, get all the terms and their occurrences within each document.

The following Java code example prints the homophone search results along with the number of occurrences of each searched term.

The following is the output of the above code example.

Query: right
Documents: 2
Total occurrences: 17

Document: C:/documents/sample.docx
Occurrences: 11
    Field: content
    Occurrences: 11
        right             3
        rite               4
        wright           1
        write             3
Document: C:/documents/sample.txt
Occurrences: 6
    Field: content
    Occurrences: 6
        right             4
        write             2

Search Homophones and Printing Results using Java - Complete Code

The following Java code combines the above steps. Initially, it finds the homophones as per query, and then prints all the occurrences of homophones from each document within the provided folder.

Conclusion

To conclude, you learned how to find the words and their homophones from multiple documents within a specified folder using Java. You can try developing your own Java application for searching homophones using GroupDocs.Search for Java.

Learn more about the Java Search Automation API from the documentation. To experience its features, you can have a look at the available examples on the GitHub repository. Reach us for any query via the forum.

See Also