An electronic book, popularly known as eBook, is a book in digital form that is readable on various electronic devices. These devices include dedicated eReaders like Kindle, or laptops, desktop computers, and smartphones. There are many popular file formats of eBooks in-use in the market that include; EPUB, FictionBook FB2, Microsoft Compiled HTML Help - CHM, DjVu, MOBI, PDF, and many others. As a programmer, this article will help you to programmatically extract images from eBooks in C# within .NET applications.
The following topics will be covered below:
- .NET API for Image Extraction from eBooks
- Extract Images from EPUB eBook in C#
- Extract Images from FB2, CHM eBooks in C#
.NET API for Image Extraction from eBooks
For the extraction of images from eBooks, I will be using GroupDocs.Parser for .NET API in the C# examples of this article. Along with the eBooks, this API supports parsing, and extraction of images from word-processing documents, spreadsheets, PDF, presentations, emails, ZIP archives, and many other document formats.
You can download the DLLs or MSI installer from the downloads section or install the API in your .NET application via NuGet.
PM> Install-Package GroupDocs.Parser
Extract Images from EPUB eBook in C#
Let’s start with the EPUB eBook to parse it for images. Following steps followed by the C# code parses the EPUB eBook and extracts all the images in it.
- Create Parser class object.
- Use GetImages method to extract all the images of the EPUB eBook.
- Traverse the extracted images to save these, one by one.
The following C# code implements the mentioned parsing steps to parse to the above shown EPUB eBook and saves the extract images one by one to the disk.
As a result, all the available images will be saved. Here is one of the images shown as a sample.
You can save the extracted images in any of the following supported image file formats:
- JPG
- PNG
- WEBP
- GIF
- BMP
Extract Images from FB2, CHM eBooks in C#
If you have the eBook in FB2, CHM, or with some other format, you can extract its images in the same way. You just have to pass your eBook to the Parser constructor while creating the object. Then the GetImages method will be extracting images from any of the provided eBooks using the same C# code.
// Pass the FB2, CHM, PDF, or any other eBook to Parser contructor
Parser parser = new Parser("ebook.fb2"); // FB2
// Parser parser = new Parser("ebook.chm"); // CHM
// Parser parser = new Parser("ebook.pdf"); // PDF
IEnumerable<PageImageArea> images = parser.GetImages();
Conclusion
I hope now you will be comfortable in programmatically getting all the images from eBooks with EPUB, FB2, CHM, and other file formats within your .NET applications. You can even build your own image extractor application using GroupDocs.Parser for .NET API.
For more about the API, you may visit documentation or open-source examples at GitHub. For any further issues, you can contact the quick support at the forum.