Search Homophones in Files using GroupDocs

Synonyms are words with similar meaning, and Homophones sounds the same but are different in meanings or spellings. We learned to find synonyms in multiple documents using Java. Today, in this article, we’ll see how to search homophones within multiple documents using Java.

The following topics will be covered below:

Java API for Searching Homophones

GroupDocs.Search showcases the Java API GroupDocs.Search for Java that allows finding any word and its homophones within multiple files of any specific folder. It can search the content of various different formats. In addition to finding the homophones, the API supports many other searching techniques which include:

  • Case-Sensitive Search
  • Fuzzy Search
  • Phrase Search
  • Regular Expressions Search
  • Synonym Search
  • Wild Card Search

You can download the JAR file from the downloads section or use the latest repository and dependency Maven configurations within your Java applications.

<repository>
	<id>GroupDocsJavaAPI</id>
	<name>GroupDocs Java API</name>
	<url>http://repository.groupdocs.com/repo/</url>
</repository>
<dependency>
        <groupId>com.groupdocs</groupId>
        <artifactId>groupdocs-search</artifactId>
        <version>21.8</version> 
</dependency>
PM> Install-Package GroupDocs.Search

Find Homophones in Multiple Files in Java

The following steps guide how to search homophones in multiple files of a folder in Java.

  • Define the search word query, indexing folder, and the container folder of your files.
  • Create Index with the defined index folder.
  • Add the document’s folder to the index.
  • Define the SearchOptions and enable the homophoneSearch using setUseHomophoneSearch method.
  • Perform the homophones search using search method.
  • Use the properties of the retrived SearchResult as needed.

The following Java source code finds all the homophones within files of the defined folder. Further, you can also manage your homophone dictionary.

// Search homophones in multiples files and folders using Java
String indexFolder = "path/indexFolder";
String documentsFolder = "path/documentsFolder";
String query = "right";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
index.add(documentsFolder);
// Creating a search options object
SearchOptions options = new SearchOptions();
options.setUseHomophoneSearch(true); // Enable Homophone Search
// Search for the word 'right'
// In addition to the word 'right', the homophones 'rite, write, wright, ...' will also be searched
SearchResult result = index.search(query, options);
System.out.println("Query: " + query);
System.out.println("Documents: " + result.getDocumentCount());
System.out.println("Word & Homophone Occurrences: " + result.getOccurrenceCount());

The output of the above code is as follows:

Query: right
Documents: 2
Occurrences: 17

You can use the homophones search results by following the steps after getting the homophones and their occurrences from each document.

  • Traverse the search results.
  • Get each FoundDocument using the getFoundDocument method.
  • Use the properties of each FoundDocument as required.
  • Now, traverse the fields of FoundDocument by getting FoundDocumentField.
  • Later, from each FoundDocumentField, get all the terms and their occurrences within each document.

The following Java code example prints the homophone search results along with the number of occurrences of each searched term.

// Printing the Homophone Search results in Java
System.out.println("Query: " + query);
System.out.println("Documents: " + result.getDocumentCount());
System.out.println("Word & Homophone Occurrences: " + result.getOccurrenceCount());
// Traverse the Documents
for (int i = 0; i < result.getDocumentCount(); i++) {
FoundDocument document = result.getFoundDocument(i);
System.out.println("Document: " + document.getDocumentInfo().getFilePath());
System.out.println("Occurrences: " + document.getOccurrenceCount());
// Traverse the found fields
for (FoundDocumentField field : document.getFoundFields()) {
System.out.println("\tField: " + field.getFieldName());
System.out.println("\tOccurrences: " + document.getOccurrenceCount());
// Printing found terms
if (field.getTerms() != null) {
for (int k = 0; k < field.getTerms().length; k++) {
System.out.println("\t\t" + field.getTerms()[k] + "\t - \t" + field.getTermsOccurrences()[k]);
}
}
}
}

The following is the output of the above code example.

Query: right
Documents: 2
Total occurrences: 17

Document: C:/documents/sample.docx
Occurrences: 11
    Field: content
    Occurrences: 11
        right             3
        rite               4
        wright           1
        write             3
Document: C:/documents/sample.txt
Occurrences: 6
    Field: content
    Occurrences: 6
        right             4
        write             2

Search Homophones and Printing Results using Java - Complete Code

The following Java code combines the above steps. Initially, it finds the homophones as per query, and then prints all the occurrences of homophones from each document within the provided folder.

// Search homophones in multiples files and folders using Java
String indexFolder = "path/indexFolder";
String documentsFolder = "path/documentsFolder";
String query = "right";
// Creating an index in the specified folder
Index index = new Index(indexFolder);
index.add(documentsFolder);
// Creating a search options object
SearchOptions options = new SearchOptions();
options.setUseHomophoneSearch(true); // Enable Homophone Search
// Search for the word 'right'
// In addition to the word 'right', the homophones 'rite, write, wright, ...' will also be searched
SearchResult result = index.search(query, options);
System.out.println("Query: " + query);
System.out.println("Documents: " + result.getDocumentCount());
System.out.println("Word & Homophone Occurrences: " + result.getOccurrenceCount());
for (int i = 0; i < result.getDocumentCount(); i++) {
FoundDocument document = result.getFoundDocument(i);
System.out.println("Document: " + document.getDocumentInfo().getFilePath());
System.out.println("Occurrences: " + document.getOccurrenceCount());
for (FoundDocumentField field : document.getFoundFields()) {
System.out.println("\tField: " + field.getFieldName());
System.out.println("\tOccurrences: " + document.getOccurrenceCount());
// Printing found terms
if (field.getTerms() != null) {
for (int k = 0; k < field.getTerms().length; k++) {
System.out.println("\t\t" + field.getTerms()[k] + "\t - \t" + field.getTermsOccurrences()[k]);
}
}
}
}

Conclusion

To conclude, you learned how to find the words and their homophones from multiple documents within a specified folder using Java. You can try developing your own Java application for searching homophones using GroupDocs.Search for Java.

Learn more about the Java Search Automation API from the documentation. To experience its features, you can have a look at the available examples on the GitHub repository. Reach us for any query via the forum.

See Also