Extract Text Areas from Document Pages using GroupDocs.Parser for .NET 18.7

GroupDocs.Parser for .NET
Today, we are excited to announce the release of version 18.7 of GroupDocs.Parser for .NET. The latest release supports extracting text areas from document pages. This feature may help you getting data for text analysis. We recommend you to upgrade the API to the latest version and share your valuable feedback.

Extracting Text Areas

Extracting text areas is useful when you need to get the data for text analysis. To extract text areas, text extractors implement their own internal private class and provide DocumentContent property (see PdfTextExtractor as the sample). The DocumentContent class has the following members:
PageCountReturns a total number of document pages
DisposeReleases resources used by the class
GetPageReturns a document page (see below)
GetTextAreasReturns a collection of TextArea objects (see below)
The following code sample shows how to get text areas from a PDF document.
// Create a text extractor
PdfTextExtractor extractor = new PdfTextExtractor("invoice.pdf");
// Create search options
TextAreaSearchOptions searchOptions = new TextAreaSearchOptions();
// Set a regular expression to search 'Invoice # XXX' text
searchOptions.Expression = "\\s?INVOICE\\s?#\\s?[0-9]+";
// Limit the search with a rectangle
searchOptions.Rectangle = new GroupDocs.Parser.Rectangle(10, 10, 300, 150);
// Get text areas
IList< textarea > texts = extractor.DocumentContent.GetTextAreas(0, searchOptions);
// Iterate over a list
foreach(TextArea area in texts)
    // Print a text

Available Channels and Resources

Here are a few channels and resources for you to download, learn, try and get technical support on GroupDocs.Parser:

Have Queries?

If you have got any queries or concerns about the API, please feel free to get in touch with us over the forum. We’ll be glad to address your concerns.