Regex Search using C# .NET

Discover the magic of programmatic regex search! In the world of coding, regular expressions are like secret codes that help us find specific information in text. This article will show you how to search by regex in C# to search documents of various file formats smartly.

.NET API for Regex Search in Documents by Regex

For Regex Search, we’ll utilize the GroupDocs.Search for .NET API. This allows us to search text within files of various formats across folders just by providing the regex pattern. This library empowers us to programmatically search text in a wide range of file formats, such as Word documents, spreadsheets, presentations, PDF files, Markup files, eBooks, email messages, One Note documents, and ZIP archives.

For an in-depth list of supported file formats, refer to the comprehensive documentation.

You have the option to either grab the DLLs or MSI installer from the downloads section, or you can install the API into your .NET application using NuGet.

How to Search in Files by Regex using C#

Follow these steps to perform a regex search in multiple files of various file formats within folders using C#: Later you can easily generate the highlighted HTML output files for each found document.

  • Create an Index by providing a folder path.
  • Add the path of the parent folder for the search directory to the created index.
  • Define the regex search query.
  • Execute the search using the Search method to get the search results.
  • Now you can iterate on SearchResults to create an output as you like.

The regex used in the below-mentioned code identifies all the words having consecutive repeated characters like; added, wood, and see. The following C# code performs the quick search by regex in multiple files of different file formats across folders.

Here I have highlighted the regex search results of the above code:

Highlighted Results of Regex Search in HTML format

Printing Search Results

The following C# code provides two ways to present your search results.

  1. Highlight all the found words.
  2. Print in a readable and analyzable format
Document: English.txt
Occurrences: 83
	Field: content
	Occurrences: 82
		acceptance          1
		added               1
		agreeable           1
		agreed              1
		all                 4
		appearance          1
		assurance           1
...
===========================================
Document: Lorem ipsum.docx
	Occurrences: 945
...
Field: content
	Occurrences: 939
		accumsan            39
		class               7
		commodo             40
		convallis           38
		dignissim           35
		efficitur           46
		fringilla           40
		habitasse           2
		laoreet             27
		massa               63
		mattis              31
...

Getting a Free License or a Free Trial

Free License

Obtain a temporary license for free to explore this library without constraints.

Free Trial

You can download the free trial from the downloads section.

.NET API for Searching within Files and Folders

Conclusion

In this article, we looked into the magic of RegEx search to find all the words having a certain pattern within various text-based documents like DOCX, PDF, and TXT files across several folders using C#. Afterward, we presented the search results by highlighting the searched words and by printing them in a readable format.

For comprehensive details about the API, readers are advised to refer to the documentation.

Any inquiries or additional discussions can be directed to the available forum.


See Also