
Extracting Text Areas
Extracting text areas is useful when you need to get the data for text analysis. To extract text areas, text extractors implement their own internal private class and provide DocumentContent property (see PdfTextExtractor as the sample). The DocumentContent class has the following members:Member | Description |
---|---|
PageCount | Returns a total number of document pages |
Dispose | Releases resources used by the class |
GetPage | Returns a document page (see below) |
GetTextAreas | Returns a collection of TextArea objects (see below) |
// Create a text extractor
PdfTextExtractor extractor = new PdfTextExtractor("invoice.pdf");
// Create search options
TextAreaSearchOptions searchOptions = new TextAreaSearchOptions();
// Set a regular expression to search 'Invoice # XXX' text
searchOptions.Expression = "\\s?INVOICE\\s?#\\s?[0-9]+";
// Limit the search with a rectangle
searchOptions.Rectangle = new GroupDocs.Parser.Rectangle(10, 10, 300, 150);
// Get text areas
IList< textarea > texts = extractor.DocumentContent.GetTextAreas(0, searchOptions);
// Iterate over a list
foreach(TextArea area in texts)
{
// Print a text
Console.WriteLine(area.Text);
}
Available Channels and Resources
Here are a few channels and resources for you to download, learn, try and get technical support on GroupDocs.Parser:- Installation – Install GroupDocs.Parser using NuGet
- Documentation – Product Docs
- Examples – GitHub Source Code Examples
- Video Tutorials – YouTube Video Tutorials
- Product Support Forum – Technical Support Forum for GroupDocs.Parser