Extract Text Areas from Document Pages using GroupDocs.Parser for .NET 18.7

GroupDocs.Parser for .NET

Today, we are excited to announce the release of version 18.7 of GroupDocs.Parser for .NET. The latest release supports extracting text areas from document pages. This feature may help you getting data for text analysis. We recommend you to upgrade the API to the latest version and share your valuable feedback.

Extracting Text Areas

Extracting text areas is useful when you need to get the data for text analysis. To extract text areas, text extractors implement their own internal private class and provide DocumentContent property (see PdfTextExtractor as the sample). The DocumentContent class has the following members:

Member
Description
PageCount Returns a total number of document pages
Dispose Releases resources used by the class
GetPage Returns a document page (see below)
GetTextAreas Returns a collection of TextArea objects (see below)

The following code sample shows how to get text areas from a PDF document.

// Create a text extractor
PdfTextExtractor extractor = new PdfTextExtractor("invoice.pdf");
 
// Create search options
TextAreaSearchOptions searchOptions = new TextAreaSearchOptions();
// Set a regular expression to search 'Invoice # XXX' text
searchOptions.Expression = "\\s?INVOICE\\s?#\\s?[0-9]+";
// Limit the search with a rectangle
searchOptions.Rectangle = new GroupDocs.Parser.Rectangle(10, 10, 300, 150);
 
// Get text areas
IList< textarea > texts = extractor.DocumentContent.GetTextAreas(0, searchOptions);
             
// Iterate over a list
foreach(TextArea area in texts)
{
    // Print a text
    Console.WriteLine(area.Text);
}

Available Channels and Resources

Here are a few channels and resources for you to download, learn, try and get technical support on GroupDocs.Parser:

Have Queries?

If you have got any queries or concerns about the API, please feel free to get in touch with us over the forum. We’ll be glad to address your concerns.

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

To keep up with our news, you can follow us on Twitter or follow our Facebook page.