If you have a document and you want to use the images inside that document in some other documents, here is one of the solutions. In this article, we will be learning to programmatically extract images from PDF, Excel, PowerPoint, and Word documents using Java.
- Image Extraction Java API
- Image Extraction from PDF documents in Java
- Extract Images from Word, Excel, PowerPoint documents in Java
- Extract Image from Specific Page in Java

Image Extraction Java API

For the extraction of images, we will use GroupDocs.Parser for Java. This Java API supports the parsing of documents and extraction of images, text, and metadata from word-processing documents, spreadsheets, presentations, archives, and email documents. The following are the document formats supported by the Java API for image extraction.
Document Type | File Formats |
---|---|
Word Processing Documents | DOC, DOCX, DOCM, DOT, DOTX, DOTM, ODT, OTT, RTF |
Spreadsheets | XLS, XLSX, XLSM, XLSB, XLT, XLTX, XLTM, ODS, OTS, XLA, XLAM, NUMBERS |
Presentations | PPT, PPTX, PPTM, PPS, PPSX, PPSM, POT, POTX, POTM, ODP, OTP |
Portable Documents | |
Emails | EML, EMLX, MSG |
Archives | ZIP |
Before you start with the examples below, I would recommend to set up the environment by downloading the latest version of document parsing Java API from the downloads section or you may set the following configurations in your maven-based java applications:
<repository>
<id>GroupDocsJavaAPI</id>
<name>GroupDocs Java API</name>
<url>http://repository.groupdocs.com/repo/</url>
</repository>
<dependency>
<groupId>com.groupdocs</groupId>
<artifactId>groupdocs-parser</artifactId>
<version>20.8</version>
</dependency>
Extract Images from PDF Documents in Java

Follow these simple steps to get all images from the PDF document.
- Instantiate Parser class object.
- Call getImages method of Parser class to get all the images.
- Iterate over images using PageImageArea.
- Save images using the save method of PageImageArea.
It’s done. See the full code below. Extracted images can be saved in BMP, GIF, JPEG, PNG, and WebP formats.
// Extract Images from Word, Excel, PowerPoint, PDF Documents Programmatically using GroupDocs.Parser for Java | |
try (Parser parser = new Parser("path/document.pdf")) { | |
// Extract images | |
Iterable<PageImageArea> images = parser.getImages(); | |
// Create the options to save images in PNG format | |
ImageOptions options = new ImageOptions(ImageFormat.Png); | |
int imageNumber = 0; | |
// Iterate over images and Save | |
for (PageImageArea image : images) { | |
// Print the page index, rectangle and image file type: | |
System.out.println(String.format("Page: %d, R: %s, Type: %s", image.getPage().getIndex(), | |
image.getRectangle(), image.getFileType())); | |
image.save(String.format("filesPath/image_%d.png", imageNumber), options); | |
imageNumber++; | |
} | |
} |
These are the images retrieved from the PDF document using the above code.

Extract Images from Word, Excel, PowerPoint Files in Java
Similarly, all the images can be taken out from the word-processing files, spreadsheets, presentations, with the unchanged code base. What you have to change? Just the source document path and the right file extension.
Parser parser = new Parser("path/document.docx") // Word Document
// Parser parser = new Parser("path/document.xlsx") // Excel Spreadsheet
// Parser parser = new Parser("path/document.pptx") // PowerPoint Presentation
// Parser parser = new Parser("path/document.pdf") // PDF Document
Image Extraction from Specific Document Page in Java
If you do not want to extract all the images from the whole document but from some specific page. Below code demonstrates how we can extract images from a particular page of the document in Java.
// Extract Images from specific page of Word, Excel, PowerPoint, PDF in Java using GroupDocs.Parser | |
try (Parser parser = new Parser("path/document.pdf"")) { | |
// Get the document info | |
IDocumentInfo documentInfo = parser.getDocumentInfo(); | |
// Create the options to save images in PNG format | |
ImageOptions options = new ImageOptions(ImageFormat.Jpeg); | |
int imageNumber = 0; | |
// Iterate over pages | |
for (int pageIndex = 0; pageIndex < documentInfo.getPageCount(); pageIndex++) { | |
// Print Page Numbers | |
System.out.println(String.format("Page %d/%d", pageIndex + 1, documentInfo.getPageCount())); | |
// Iterate over images - Ignoring NULL-Checking in the examples | |
for (PageImageArea image : parser.getImages(pageIndex)) { | |
// Print Image Information and Save file | |
System.out.println(String.format("R: %s, Text: %s", image.getRectangle(), image.getFileType())); | |
image.save(String.format("filesPath/image_%d.jpeg", imageNumber), options); | |
imageNumber++; | |
} | |
} | |
} |
Conclusion
Today, we learned how to extract images from the whole document, and the specific page of word-processing documents, spreadsheets, presentations, and PDF in Java. There is no difference in the code if we have to extract images from the files of different file formats. We just have to pass the right path and name. That’s it.