Build your full text search solution in C#

Full Text Search

There are two main steps to perform or implement full-text search.

  • Indexing
  • Search process

Before we leap into the details, let’s get an overview of this technique. Full-text search is basically a more advanced way to search a text/query over a collection of documents in a computer. This approach quickly finds all instances of a term and it works by using text indexes.

One of the examples of full-text search implementation is in Word processors and text editors. It helps you to find a phrase or word anywhere in the document.

API Usage

GroupDocs.Search for .NET is a full-text search, back-end API that can be integrated in any .NET application without any third-party tool or software dependency. It allows you to search over a multitude of document formats in your applications. To make it possible to search instantly across thousands of documents, they must be added to the index. All you have to do is to add DLL reference in your project in order to get started.

Indexing

If you are supposed to perform a search over a large number of documents with different or same file formats, you need to create an index.

What is an index?

An index possesses scanned text of all the documents. Therefore, when you are going to perform a search operation (search a specific query), only the index is referenced, rather than the text of the original documents.

Index creation

It can be created in memory or on disk. An index created in memory cannot be saved after exiting your program. In contrast, an index created on disk may be loaded in the future to continue working. The following example shows how to create an index on disk.

string indexFolder = @"c:\MyIndex\"; // Specify the path to the index folder
Index index = new Index(indexFolder);

Perform Search

When documents are indexed, the index is ready to handle search queries. Following types of search queries are supported:

  • Simple
  • Case sensitive
  • Boolean
  • Phrasal
  • Faceted

See the complete list in this article.

string query = "gaps"; // Specify a search query
SearchResult result = index.Search(query); // Searching in the index

Let’s go through a use-case. Suppose we have multiple documents (Word, PDF, Excel and HTML) and we want to perform a specific search query (search term “video”) over them.

string indexFolder = @"D://Samples/Index";
string documentsFolder = @"D://Samples/Source";
// Creating index in the specified folder
Index index = new Index(indexFolder);  
// Indexing documents from the specified folder
index.Add(documentsFolder);
// Searching in index
SearchResult result = index.Search("video");
foreach (FoundDocument document in result)
{
     Console.WriteLine("Document Path : " + document.DocumentInfo.FilePath);
     Console.WriteLine("Occurance : " + document.OccurrenceCount + "\n");
}

We will get document path and search term occurrences in all the documents available in documentFolder.

Full Search Text Output

Let’s generate output HTML with highlighted search results.

string indexFolder = @"D://Samples/Index";
string documentsFolder = @"D://Samples/Source";
// Creating index in the specified folder
Index index = new Index(indexFolder);  
// Indexing documents from the specified folder
index.Add(documentsFolder);
// Searching in index
SearchResult result = index.Search("video");
for (int i = 0; i < result.DocumentCount; i++)
{
     if (result.DocumentCount > 0)
     {
           FoundDocument document = result.GetFoundDocument(i); // Getting the first found document
           OutputAdapter outputAdapter = new FileOutputAdapter(@"D:\Highlighted" + i + ".html"); // Creating the output adapter to a file
           HtmlHighlighter highlighter = new HtmlHighlighter(outputAdapter); // Creating the highlighter object
           index.Highlight(document, highlighter); // Generating output HTML formatted document with highlighted search results

     }
}

As an output, we will get 6 HTML files. Each file will show the content of a different document (e.g. excel.xlsx, source.docx, target.docx) with highlighted search term/word. Given below is the output of excel.xlsx and source.docx files.

We have an open-source example project that you can use in order to evaluate API features. Please go through the Documentation for more details and if you face any issue, please post it on the forum.