GroupDocs.Parser for .NET

Today, we are excited to announce the release of version 18.7 of GroupDocs.Parser for .NET. The latest release supports extracting text areas from document pages. This feature may help you getting data for text analysis. We recommend you to upgrade the API to the latest version and share your valuable feedback.

Extracting Text AreasExtracting text areas is useful when you need to get the data for text analysis. To extract text areas, text extractors implement their own internal private class and provide DocumentContent property (see PdfTextExtractor as the sample). The DocumentContent class has the following members:




Returns a total number of document pages


Releases resources used by the class


Returns a document page (see below)


Returns a collection of TextArea objects (see below)

The following code sample shows how to get text areas from a PDF document.``` // Create a text extractor PdfTextExtractor extractor = new PdfTextExtractor(“invoice.pdf”);

// Create search options TextAreaSearchOptions searchOptions = new TextAreaSearchOptions(); // Set a regular expression to search ‘Invoice # XXX’ text searchOptions.Expression = “\s?INVOICE\s?#\s?[0-9]+”; // Limit the search with a rectangle searchOptions.Rectangle = new GroupDocs.Parser.Rectangle(10, 10, 300, 150);

// Get text areas IList< textarea > texts = extractor.DocumentContent.GetTextAreas(0, searchOptions);

// Iterate over a list foreach(TextArea area in texts) { // Print a text Console.WriteLine(area.Text); }

## Available Channels and ResourcesHere are a few channels and resources for you to download, learn, try and get technical support on GroupDocs.Parser:

*   [Installation]( "GroupDocs.Text Nuget Package") - Install GroupDocs.Parser using NuGet
*   [Documentation]( "GroupDocs.Text Documentation") - Product Docs
*   [Examples]( "GroupDocs.Text Github repository") - GitHub Source Code Examples
*   [Video Tutorials]( "GroupDocs.Text for .NET tutorials") – YouTube Video Tutorials
*   [Product Support Forum]( "GroupDocs.Text for .NET Support forum") – Technical Support Forum for GroupDocs.Parser

