PDF, being one of the most popular file formats is in use by almost every business and industry. PDF documents can contain diverse content including formatted text, images, annotations, etc. It is often required to extract the content from the PDF files. Here in this article, we will discuss how to programmatically extract images from PDF documents in Java.

Java API to Extract Images from PDF Files

GroupDocs provides GroupDocs.Parser for Java developers for the extraction of images from PDF files. Along with the PDF file, the same API supports the parsing as well as extraction of images from various other document formats like word-processing documents, spreadsheets, eBooks, presentations, emails, ZIP archives, and many other document formats.

Download or Configure

You may download the JAR file from the downloads section, or just get the repository and dependency configurations for the pom.xml of your maven-based Java applications.

<repository>
    <id>groupdocs-artifacts-repository</id>
    <name>GroupDocs Artifacts Repository</name>
    <url>https://releases.groupdocs.com/java/repo/</url>
</repository>
<dependency>
    <groupId>com.groupdocs</groupId>
    <artifactId>groupdocs-parser</artifactId>
    <version>22.11</version>
</dependency>

Steps to Extract Images from a PDF document in Java

The following are the step-by-step points that show how to get images from the PDF file using a few lines of Java code.

  1. Create a new project.
  2. Download the API as mentioned above or update to the latest API version.
  3. Import the following classes:
import com.groupdocs.parser.Parser;
import com.groupdocs.parser.data.PageImageArea;
view raw Imports.java hosted with ❤ by GitHub
  1. Load the PDF document using the Parser class.
// Load PDF file
try (Parser parser = new Parser("path/document.pdf")) {
// The Image Extraction Code goes here.
}
  1. Extract all the images from the document using getImages method.
// Extract Images from the loaded file
Iterable<PageImageArea> images = parser.getImages();
  1. Access each image from the collection and save it using the save method.
// Save the file with their extension
for (PageImageArea image : images) {
image.save(String.format("path/image_%d" + image.getFileType().getExtension(), imageCounter++));
}
view raw SaveImages.java hosted with ❤ by GitHub

Images can be saved in various different image formats like PNG, JPG, BMP, WebP, or GIF.

Java Complete Code – Image Extraction from PDF

Here is the complete source code that allows you to get all the images from the provided PDF file.

// Extract Images from PDF file in Java
try (Parser parser = new Parser("path/document.pdf"))
{
// Get images
Iterable<PageImageArea> images = parser.getImages();
// Check if images extraction is supported
if (images == null)
{
System.out.println("Images extraction isn't supported");
return;
}
int imageCounter = 0;
// Iterate extracted images
for (PageImageArea image : images)
{
image.save(String.format("path/image_%d" + image.getFileType().getExtension(), imageCounter++));
}
}

Results

Sample PDF Document

PDF document having images to extract.

Extracted Images

extracted images from the PDF.

If you require, it is also explained in a separate article that how you can Extract Images from any Specific Page of a PDF Document in Java.

Read More

You can explore more about the data extraction Java API using its documentation. You can share your queries with us via our forum.

See Also