Synonyms are words with similar meaning, and Homophones sounds the same but are different in meanings or spellings. We learned to find synonyms in multiple documents using Java. Today, in this article, we’ll see how to search homophones within multiple documents using Java.
The following topics will be covered below:
- Java API - Homophone Search
- Find homophones in documents using Java
- Play with Homophone Search Result
Java API for Searching Homophones
GroupDocs.Search showcases the Java API GroupDocs.Search for Java that allows finding any word and its homophones within multiple files of any specific folder. It can search the content of various different formats. In addition to finding the homophones, the API supports many other searching techniques which include:
- Case-Sensitive Search
- Fuzzy Search
- Phrase Search
- Regular Expressions Search
- Synonym Search
- Wild Card Search
You can download the JAR file from the downloads section or use the latest repository and dependency Maven configurations within your Java applications.
<repository>
<id>GroupDocsJavaAPI</id>
<name>GroupDocs Java API</name>
<url>http://repository.groupdocs.com/repo/</url>
</repository>
<dependency>
<groupId>com.groupdocs</groupId>
<artifactId>groupdocs-search</artifactId>
<version>21.8</version>
</dependency>
PM> Install-Package GroupDocs.Search
Find Homophones in Multiple Files in Java
The following steps guide how to search homophones in multiple files of a folder in Java.
- Define the search word query, indexing folder, and the container folder of your files.
- Create Index with the defined index folder.
- Add the document’s folder to the index.
- Define the SearchOptions and enable the homophoneSearch using setUseHomophoneSearch method.
- Perform the homophones search using search method.
- Use the properties of the retrived SearchResult as needed.
The following Java source code finds all the homophones within files of the defined folder. Further, you can also manage your homophone dictionary.
// Search homophones in multiples files and folders using Java | |
String indexFolder = "path/indexFolder"; | |
String documentsFolder = "path/documentsFolder"; | |
String query = "right"; | |
// Creating an index in the specified folder | |
Index index = new Index(indexFolder); | |
index.add(documentsFolder); | |
// Creating a search options object | |
SearchOptions options = new SearchOptions(); | |
options.setUseHomophoneSearch(true); // Enable Homophone Search | |
// Search for the word 'right' | |
// In addition to the word 'right', the homophones 'rite, write, wright, ...' will also be searched | |
SearchResult result = index.search(query, options); | |
System.out.println("Query: " + query); | |
System.out.println("Documents: " + result.getDocumentCount()); | |
System.out.println("Word & Homophone Occurrences: " + result.getOccurrenceCount()); |
The output of the above code is as follows:
Query: right
Documents: 2
Occurrences: 17
Printing Homophone Search Results in Java
You can use the homophones search results by following the steps after getting the homophones and their occurrences from each document.
- Traverse the search results.
- Get each FoundDocument using the getFoundDocument method.
- Use the properties of each FoundDocument as required.
- Now, traverse the fields of FoundDocument by getting FoundDocumentField.
- Later, from each FoundDocumentField, get all the terms and their occurrences within each document.
The following Java code example prints the homophone search results along with the number of occurrences of each searched term.
// Printing the Homophone Search results in Java | |
System.out.println("Query: " + query); | |
System.out.println("Documents: " + result.getDocumentCount()); | |
System.out.println("Word & Homophone Occurrences: " + result.getOccurrenceCount()); | |
// Traverse the Documents | |
for (int i = 0; i < result.getDocumentCount(); i++) { | |
FoundDocument document = result.getFoundDocument(i); | |
System.out.println("Document: " + document.getDocumentInfo().getFilePath()); | |
System.out.println("Occurrences: " + document.getOccurrenceCount()); | |
// Traverse the found fields | |
for (FoundDocumentField field : document.getFoundFields()) { | |
System.out.println("\tField: " + field.getFieldName()); | |
System.out.println("\tOccurrences: " + document.getOccurrenceCount()); | |
// Printing found terms | |
if (field.getTerms() != null) { | |
for (int k = 0; k < field.getTerms().length; k++) { | |
System.out.println("\t\t" + field.getTerms()[k] + "\t - \t" + field.getTermsOccurrences()[k]); | |
} | |
} | |
} | |
} |
The following is the output of the above code example.
Query: right
Documents: 2
Total occurrences: 17
Document: C:/documents/sample.docx
Occurrences: 11
Field: content
Occurrences: 11
right 3
rite 4
wright 1
write 3
Document: C:/documents/sample.txt
Occurrences: 6
Field: content
Occurrences: 6
right 4
write 2
Search Homophones and Printing Results using Java - Complete Code
The following Java code combines the above steps. Initially, it finds the homophones as per query, and then prints all the occurrences of homophones from each document within the provided folder.
// Search homophones in multiples files and folders using Java | |
String indexFolder = "path/indexFolder"; | |
String documentsFolder = "path/documentsFolder"; | |
String query = "right"; | |
// Creating an index in the specified folder | |
Index index = new Index(indexFolder); | |
index.add(documentsFolder); | |
// Creating a search options object | |
SearchOptions options = new SearchOptions(); | |
options.setUseHomophoneSearch(true); // Enable Homophone Search | |
// Search for the word 'right' | |
// In addition to the word 'right', the homophones 'rite, write, wright, ...' will also be searched | |
SearchResult result = index.search(query, options); | |
System.out.println("Query: " + query); | |
System.out.println("Documents: " + result.getDocumentCount()); | |
System.out.println("Word & Homophone Occurrences: " + result.getOccurrenceCount()); | |
for (int i = 0; i < result.getDocumentCount(); i++) { | |
FoundDocument document = result.getFoundDocument(i); | |
System.out.println("Document: " + document.getDocumentInfo().getFilePath()); | |
System.out.println("Occurrences: " + document.getOccurrenceCount()); | |
for (FoundDocumentField field : document.getFoundFields()) { | |
System.out.println("\tField: " + field.getFieldName()); | |
System.out.println("\tOccurrences: " + document.getOccurrenceCount()); | |
// Printing found terms | |
if (field.getTerms() != null) { | |
for (int k = 0; k < field.getTerms().length; k++) { | |
System.out.println("\t\t" + field.getTerms()[k] + "\t - \t" + field.getTermsOccurrences()[k]); | |
} | |
} | |
} | |
} |
Conclusion
To conclude, you learned how to find the words and their homophones from multiple documents within a specified folder using Java. You can try developing your own Java application for searching homophones using GroupDocs.Search for Java.
Learn more about the Java Search Automation API from the documentation. To experience its features, you can have a look at the available examples on the GitHub repository. Reach us for any query via the forum.