eBooks of various formats are very common in everyday use. The eBook can contain text as well as images. If you want to use the images of any eBook elsewhere, you can get these easily extracted programmatically within your Java application. In this article, you will learn to automate, how to extract images from eBook files such as EPUB, PDF, FB2, CHM in Java.
The following topics will be covered below:
- Java API - Image Extraction from eBooks
- Extract Images from EPUB eBook in Java
- Extract Images from PDF, FB2, CHM eBooks in Java
Java API to Extract Images from eBooks
GroupDocs.Parser for Java API is a feature-rich automation API for extracting images from eBooks and documents in Java. In addition to this, the API supports parsing, and extraction of images, text, and metadata from word-processing documents, spreadsheets, PDF, presentations, emails, ZIP archives, and many other supported document formats.
Download and Configure
Get the JAR file from the downloads section, or just add the following pom.xml configuration in your Maven-based Java applications to try the below-mentioned examples. For the details, you may visit the API Reference.
<repository> <id>GroupDocsJavaAPI</id> <name>GroupDocs Java API</name> <url>http://repository.groupdocs.com/repo/</url> </repository> <dependency> <groupId>com.groupdocs</groupId> <artifactId>groupdocs-parser</artifactId> <version>21.2</version> </dependency>
Extract Images from EPUB eBook in Java
Let’s start with the EPUB eBook to parse it for images. The following steps parse the EPUB eBook and extract all the images from it using Java code.
- Create Parser class object with the eBook.
- Use getImages method to extract all the images of the EPUB eBook.
- Traverse the extracted images and save them to disk.
The following Java code parses the EPUB eBook and saves the images of the eBook one by one to the disk.
As a result, all the images will be saved to the provided location. Here is one of the images shown as a sample.
The images can be saved in any of the following image file formats:
Extract Images from PDF, FB2, CHM eBooks in Java
In addition to the EPUB format, if you have your eBook in PDF, FB2, CHM, or with some other format, you can extract their images in the same way. Just pass your eBook to the Parser constructor while creating the object. After that, the getImages method will be extracting images from your provided eBooks using the same Java code.
// Provide different eBook formats to the Parser constructor to extract the images. // Parser parser = new Parser("ebook.epub"); Parser parser = new Parser("ebook.pdf"); // Parser parser = new Parser("ebook.fb2"); // Parser parser = new Parser("ebook.chm"); Iterable<PageImageArea> images = parser.getImages();
In this article, you learned to programmatically get all the images from PDF, EPUB, FB2, CHM eBooks within your Java applications. Now you can try to build your own image extractor Java application using GroupDocs.Parser for Java API.