PDF, being one of the most popular file formats is in use by almost every business and industry. PDF documents can contain diverse content including formatted text, images, annotations, etc. It is often required to extract the content from the PDF files. Here in this article, we will discuss how to programmatically extract images from PDF documents in Java.

Java API to Extract Images from PDF Files

GroupDocs provides GroupDocs.Parser for Java developers for the extraction of images from PDF files. Along with the PDF file, the same API supports the parsing as well as extraction of images from various other document formats like word-processing documents, spreadsheets, eBooks, presentations, emails, ZIP archives, and many other document formats.

Download or Configure

You may download the JAR file from the downloads section, or just get the repository and dependency configurations for the pom.xml of your maven-based Java applications.

<repository>
    <id>groupdocs-artifacts-repository</id>
    <name>GroupDocs Artifacts Repository</name>
    <url>https://releases.groupdocs.com/java/repo/</url>
</repository>
<dependency>
    <groupId>com.groupdocs</groupId>
    <artifactId>groupdocs-parser</artifactId>
    <version>22.11</version>
</dependency>

Steps to Extract Images from a PDF document in Java

The following are the step-by-step points that show how to get images from the PDF file using a few lines of Java code.

  1. Create a new project.
  2. Download the API as mentioned above or update to the latest API version.
  3. Import the following classes:
  1. Load the PDF document using the Parser class.
  1. Extract all the images from the document using getImages method.
  1. Access each image from the collection and save it using the save method.

Images can be saved in various different image formats like PNG, JPG, BMP, WebP, or GIF.

Java Complete Code – Image Extraction from PDF

Here is the complete source code that allows you to get all the images from the provided PDF file.

Results

Sample PDF Document

PDF document having images to extract.

Extracted Images

extracted images from the PDF.

If you require, it is also explained in a separate article that how you can Extract Images from any Specific Page of a PDF Document in Java.

Read More

You can explore more about the data extraction Java API using its documentation. You can share your queries with us via our forum.

See Also