GroupDocs.Text for .NET We are excited to announce that we have released version 17.12 of GroupDocs.Text for .NET API. In this version, we have introduced a simplified way of extracting text as well as formatted text using a simple interface. We have also extended the support of IPageTextExtractor interface for OneNote documents. Furthermore, the latest version also allows you to retrieve an entity by its name from ZIP container. Please continue to read for more details.

GroupDocs.Text for .NET API - New Features

Extracting Text via Extractor Class

This feature allows you to extract text from a file or stream via a simple interface. We have added ExtractText method to Extractor class for this purpose. Following code snippet demonstrates the usage of this feature.

// Extract text from the stream
Console.WriteLine(Extractor.Default.ExtractText(stream));

// Extract text from the file
Console.WriteLine(Extractor.Default.ExtractText(fileName)); 

For more details, please visit this documentation article.

Extracting Formatted Text via Extractor Class

This feature allows extracting formatted text from a file or stream via Extractor class. We have added ExtractFormattedText method to Extractor class for this feature. Following code snippet shows the usage of this feature.

// Extract formatted text from the stream
Console.WriteLine(Extractor.Default.ExtractFormattedText(stream));
 
// Extract formatted text from the file
Console.WriteLine(Extractor.Default.ExtractFormattedText(fileName)); 

For more details, please visit this documentation article.

Retrieving Entity by Full Name from ZIP Container

This feature allows you to get an entity by its full name from ZIP container. We have added GetEntity method to ZipContainer class for the implementation of this feature. Following is the sample code that could be used to get entity by its name.

// Create a factory
ExtractorFactory factory = new ExtractorFactory();
// Create Zip container
ZipContainer zip = new ZipContainer(stream);
// Try to get "container.xml" entity from "META-INF" folder
Container.Entity containerEntry = zip.GetEntity("META-INF\\container.xml");
// If the entity isn't found
if (containerEntry == null)
{
    throw new GroupDocsTextException("File not found");
}
 
// Try to create a text extractor
TextExtractor extractor = factory.CreateTextExtractor(containerEntry.OpenStream());
try
{
    // Extract text (if the document type is supported)
    Console.WriteLine(extractor == null ? "Document type isn't supported" : extractor.ExtractAll());
}
finally
{
    // Cleanup
    if (extractor != null)
    {
        extractor.Dispose();
    }
}

For more details, please visit this documentation article.

.NET Text Extraction API - Enhancements

IPageTextExtractor Support for NoteTextExtractor

In GroupDocs.Text for .NET 17.12, we have extended the support of IPageTextExtractor for OneNote documents. IPageTextExtractor interface allows you to work with the document’s pages in the same way for all supported documents. Following code snippets shows how to extract the text of OneNote document pages using IPageTextExtractor.

// Create a text extractor
NoteTextExtractor textExtractor = new NoteTextExtractor(stream);

// Check if IPageTextExtractor is supported
    IPageTextExtractor pageTextExtractor = textExtractor as IPageTextExtractor;
    if (pageTextExtractor != null)
    {
        // Iterate over all pages
        for (int i = 0; i < pageTextExtractor.PageCount; i++)
        {
            // Print a page number
            Console.WriteLine(string.Format("{0}/{1}", i, pageTextExtractor.PageCount));
            // Extract text from the page
            Console.WriteLine(pageTextExtractor.ExtractPage(i));
        }
    } 

For more details, please visit this documentation article.

ITextExtractorWithFormatter Interface

Since version 17.12, we have added a feature that allows you to get or set document formatter via ITextExtractorWithFormatter interface. ITextExtractorWithFormatter interface has only one property.

DocumentFormatter DocumentFormatter { get; set; }

This property allows you to get or set document formatter of all types of formatted text extractors. Following code sample demonstrates its usage.

// If the extractor supports ITextExtractorWithFormatter interface
if (extractor is ITextExtractorWithFormatter) {
  // Set MarkdownDocumentFormatter formatter
  (extractor as ITextExtractorWithFormatter).DocumentFormatter = new MarkdownDocumentFormatter;
}

For more details, please visit this documentation article.

GroupDocs.Text for .NET - Available Channels and Resources

Here are a few channels and resources for you to download, learn, try and get technical support on GroupDocs.Text:

Feedback

As always, you are welcome to share your feedback or suggestions to improve this product. Just create a new topic at our forum and our dedicated support team will be there to respond.