Extract TOC from EPUB Documents using GroupDocs.Text for .NET 18.4

GroupDocs.Text for .NETIt gives us immense pleasure to announce the release of version 18.4 of GroupDocs.Text for .NET. The latest version allows extracting the table of contents from the EPUB documents. Furthermore, we have added the feature of detecting media type of .one file. Following sections provide details about the newly added features.

Extracting TOC from EPUB Documents

Using version 18.4, you can now extract TOC from the EPUB documents. To access the TOC, TableOfContents property of EpubPackage class is used. Once you get the TOC from the document, you can access the following properties of TOC items using TableOfContentsItem class:

  • Text – the text of the item (usually, it is a chapter’s title)
  • PageIndex – the page index of the text (null if it is just a node without content)
  • Count – the number of sub-items (zero if the item hasn’t sub-items)
  • this[int index] – gets a sub-item
  • ExtractPage – extracts a text of the item

Following code snippet shows how to extract TOC from EPUB document.

// Create a text extractor
using (EpubTextExtractor extractor = new EpubTextExtractor(@"document.epub"))
{
    // Print TOC on the screen
    PrintToc(extractor[0].TableOfContents, 0);
}
 
private static void PrintToc(IEnumerable tableOfContents, int depth)
{
    // Use spaces to indicate the depth of the TOC item
    string spaces = new string(' ', depth);
 
    // Iterate over items
    foreach (TableOfContentsItem item in tableOfContents)
    {
        System.Console.Write(spaces);
        // Print the item's text
        System.Console.Write(item.Text);
 
        // If item has a text (it's not just a node)
        if (item.PageIndex.HasValue)
        {
            // Print the text length
            System.Console.Write(string.Format(" ({0})", item.ExtractPage().Length));
        }
 
        System.Console.WriteLine();
 
        // If the item has children
        if (item.Count > 0)
        {
            // Print them
            PrintToc(item, depth + 1);
        }
    }
}

Media Type Detector for .one Files

This feature allows detecting the media type of OneNote sections using NoteMediaTypeDetector class. Following code snippet shows how to use this feature.

// Create a media type detector
var detector = new NoteMediaTypeDetector();
// Detect a media type by the file name
Console.WriteLine(detector.Detect("section.one");
// Detect a media type by the content
Console.WriteLine(detector.Detect(stream));

Available Channels and Resources

Here are a few channels and resources for you to download, learn, try and get technical support on GroupDocs.Text:

Feedback

As always, you are welcome to share your feedback or suggestions to improve this product. Just create a new topic at our forum and our dedicated support team will be there to respond.

Share on FacebookShare on Google+Tweet about this on TwitterShare on LinkedIn

To keep up with our news, you can follow us on Twitter or follow our Facebook page.