There are two main steps to perform or implement full text search.
- Search process
Before we leap into the details, let’s get an overview of this technique. Full text search is basically a more advanced way to search a text/query over a collection of documents in a computer. This approach quickly finds all instances of a term and it works by using text indexes.
One of the examples of full text search implementation is in Word processors and text editors. It helps you to find a phrase or word anywhere in the document.
GroupDocs.Search for .NET is a full text search, back-end API that can be integrated in any .NET application without any third party tool or software dependency. It allows you to search over a multitude of document formats in your applications. To make it possible to search instantly across thousands of documents, they must be added to the index. All you have to do is to add DLL reference in your project in order to get started.
If you are supposed to perform search over a large number of documents with different or same file formats, you need to create an index.
What is an index?
An index possesses scanned text of all the documents. Therefore, when you are going to perform a search operation (search a specific query), only the index is referenced, rather than the text of the original documents.
It can be created in memory or on disk. An index created in memory cannot be saved after exiting your program. In contrast, an index created on disk may be loaded in the future to continue working. The following example shows how to create an index on disk.
When documents are indexed, the index is ready to handle search queries. Following types of search queries are supported:
- Case sensitive
See complete list in this article.
Lets go through a use-case. Suppose we have multiple documents (Word, PDF, Excel and HTML) and we want to perform a specific search query (search term “video”) over them.
We will get document path and search term occurrences in all the documents available in documentFolder.
Lets generate output HTML with highlighted search results.
As an output, we will get 6 HTML files. Each file will show content of a different document (e.g. excel.xlsx, source.docx, target.docx) with highlighted search term/word. Given below is the output of excel.xlsx and source.docx files.