Build your full text search solution in C#

There are two main steps to perform or implement full text search.

  • Indexing
  • Search process

Before we leap into the details, let’s get an overview of this technique. Full text search is basically a more advanced way to search a text/query over a collection of documents in a computer. This approach quickly finds all instances of a term and it works by using text indexes.

One of the examples of full text search implementation is in Word processors and text editors. It helps you to find a phrase or word anywhere in the document.

API Usage

GroupDocs.Search for .NET is a full text search, back-end API that can be integrated in any .NET application without any third party tool or software dependency. It allows you to search over a multitude of document formats in your applications. To make it possible to search instantly across thousands of documents, they must be added to the index. All you have to do is to add DLL reference in your project in order to get started.

Indexing

If you are supposed to perform search over a large number of documents with different or same file formats, you need to create an index.

What is an index?

An index possesses scanned text of all the documents. Therefore, when you are going to perform a search operation (search a specific query), only the index is referenced, rather than the text of the original documents.

Index creation

It can be created in memory or on disk. An index created in memory cannot be saved after exiting your program. In contrast, an index created on disk may be loaded in the future to continue working. The following example shows how to create an index on disk.

Perform Search

When documents are indexed, the index is ready to handle search queries. Following types of search queries are supported:

  • Simple
  • Case sensitive
  • Boolean
  • Phrasal
  • Faceted

See complete list in this article.

Lets go through a use-case. Suppose we have multiple documents (Word, PDF, Excel and HTML) and we want to perform a specific search query (search term “video”) over them.

We will get document path and search term occurrences in all the documents available in documentFolder.

Lets generate output HTML with highlighted search results.

As an output, we will get 6 HTML files. Each file will show content of a different document (e.g. excel.xlsx, source.docx, target.docx) with highlighted search term/word. Given below is the output of excel.xlsx and source.docx files.

We have an open-source example project that you can use in order to evaluate API features. Go through the developer guide and if you face any issue, you can post it on forum.

Share on FacebookTweet about this on TwitterShare on LinkedIn