Find Homophones in Multiple Files using C#

Words that sound the same but are different in meanings or spellings are Homophones. Whereas, the words that spell the same, but differ in meaning or pronunciation are Homographs.¬†Homonyms can either be homophone or homograph; or both. Let’s not confuse and automate it. In this article, you’ll learn how to search homophones within multiple documents using C#.

Search Homophones in Files using GroupDocs

The following topics will be covered below:

.NET API for Searching Homophones in Multiple Files

GroupDocs.Search showcases the .NET API (GroupDocs.Search for .NET) that allows searching words and their homophones within multiple files of the specified folder. We will use this API in the examples of this article. It can search the content of various different formats. Along with finding the homophones, the API supports many other ways to search as per requirement. Some of the supported search techniques are as follows:

  • Synonym Search
  • Phrase Search
  • Fuzzy Search
  • Case-Sensitive Search
  • Regular Expressions Search
  • Wild Card Search

You can download the DLLs or MSI installer from the downloads section or install the API in your .NET application via NuGet.

PM> Install-Package GroupDocs.Search

Find Homophones in Multiple Files using C#

The following steps guide how we can search homophones (words with similar sound/pronunciation) in files within a folder using C#.

  • Define the search query, an indexing folder, and the folder that contain your files.
  • Create Index with the defined index folder.
  • Add the document’s folder to the created index.
  • Define the SearchOptions and set the UseHomophoneSearch to true.
  • Search all the homophones by calling Search method with the query and search options.
  • Use the summary using the properties of the retrived SearchResult.

The following C# source code finds all the homophones within all the files of a defined folder. Additionally, you can manage your homophone dictionary.

The output of the above code is as follows:

Query: right
Documents: 2
Occurrences: 17

Follow the below-mentioned steps after getting all the homophones and their number of occurrences in each document to present the homophone search results.

  • Traverse the homophone search results that are retrieved earlier.
  • Get each document as FoundDocument using the GetFoundDocument() method.
  • Use the properties of each FoundDocument as required.
  • Now, traverse the FoundFields of FoundDocument to get FoundDocumentField.
  • Lastly, from each FoundDocumentField, get its Terms and their occurrences within each document.

The following C# source code prints the homophone search results along with the number of occurrences of each searched term.

The following is the output of the above code example.

Query: right
Documents: 2
Total occurrences: 17

Document: C:/documents/sample.docx
Occurrences: 11
    Field: content
    Occurrences: 11
        right             3
        rite               4
        wright           1
        write             3
Document: C:/documents/sample.txt
Occurrences: 6
    Field: content
    Occurrences: 6
        right             4
        write             2

Search Homophones and Printing Results using C# – Complete Code

The following C# code sums up the above steps, it first finds all the homophones according to the query, and then prints all the occurrences of all the homophones in each document within the provided folder.

Conclusion

To sum up, you have learned how to find the words and their homophones from the multiple documents of the specified folder using C#. You can try building your own .NET application for searching homophones within multiple files using GroupDocs.Search for .NET.

Learn more about the .NET Search Automation API from the documentation. To experience the features, you can have a look at available examples on the GitHub repository. Reach us for any query via the forum.

See Also