- You don’t know what color or colour spelling was used in the document.
- Searching for “John“, but it could have been spelled as “Jon” or maybe “Jhon“.
- Locating “USA” when the user inputs “U.S.A“.
- The word or phrase you are looking for could have a “mistaek“, oops it’s again a “mistake“.
Here comes the Fuzzy Search. Fuzzy search lets you find approximate matches rather than exact matches in data, allowing for flexibility in search queries. This is particularly useful in scenarios with typos, misspellings, or variations in the data. This article shows how to programmatically perform a fuzzy search in multiple documents across folders using C#.
C# Fuzzy Search Library
For Fuzzy Search, we’ll utilize the GroupDocs.Search for .NET API. This allows for a certain degree of tolerance in spelling, making it effective in accommodating regional language variations like those between British and American English.
This library empowers us to programmatically search text in a wide range of file formats, such as Word documents (DOC, DOCX), spreadsheets (XLS, XLSX), presentations (PPT, PPTX), PDF files, Markup (HTML, XML), Markdown (MD), eBooks (EPUB, CHM, FB2), email messages (MSG, EML), OneNote documents, and ZIP archives.
To find out all the types of files you can work with, check out the documentation.
You can choose to get the DLLs or MSI installer from the download section or add the API to your .NET program using NuGet.
Let’s Fuzzy Search in Files using C#
Follow these steps to perform a fuzzy search in multiple files of various file formats within folders using C#:
- Create an Index by providing a folder path.
- Add the parent folder path for the search directory to the created index.
- Define the search query.
- Activate the Fuzzy Search by enabling the option.
- Set the Similarity Level in the Fuzzy Algorithm as required.
- Execute the search using the Search method to get the search results.
- Now, you can iterate on SearchResults to create or print the output as you like.
The fuzzy search in the below-mentioned C# code finds the approximate matches of the given query in all the files across all sub folders with 20% error tolerance in spellings.
Here, the similarity level is set to 0.8 i.e. 80% match which corresponds to 20% error tolerance. You can adjust the tolerance level by tweaking the similarity level in the code.
Below are the fuzzy search results you can obtain from the above code. It is quite simple, however, the printing code is also available later in this article.
Query: nulla
Documents: 2
Occurrences: 135
Document: Lorem ipsum.docx
Occurrences: 132
Field: content
Occurrences: 132
nulla 98
nullam 34
Document: EnglishText.txt
Occurrences: 3
Field: content
Occurrences: 3
dull 1
full 1
fully 1
Printing Search Results
The following C# code provides two ways to present your search results.
- Highlight all the approximate matches.
- Print the results in a readable and analyzable format
Getting a Free License or a Free Trial
Free License
Obtain a temporary license for free to explore this library without constraints.
Free Trial
You can download the free trial from the downloads section.
Conclusion
In this article, we looked into the C# programmatic approach to the magic of Fuzzy search to find all the approximate matching words but with a certain degree of error tolerance. This feature makes the fuzzy search effective in accommodating regional language variations like those between British and American English, typos in text, name variations, and phonetic matching.
For comprehensive details about the API, readers are advised to refer to the documentation.
Any queries or additional discussions can be directed to the forum.