Portable Document Format (PDF) is a popular and widely used document format developed by Adobe. PDF documents can contain a variety of content including formatted text, images, annotations, form fields, etc. Parsing PDF documents programmatically is a popular use case and there are multiple ways of extracting the text. However, extracting images from a PDF document is a complex task. This article demonstrates how easily you can extract images from PDF documents programmatically in C#.

.NET API to Extract Images from PDF Files

GroupDocs.Parser for .NET API will play its part in the extraction of images from PDF files. Along with the PDF, the API supports the parsing, and extraction of images from word-processing documents, spreadsheets, eBooks, presentations, emails, ZIP archives, and many other document formats.

You can download the DLLs or MSI installer from the downloads section or install the API in your .NET application via NuGet.

PM> Install-Package GroupDocs.Parser

Steps to Extract Images from a PDF document using C#

Let’s quickly look step by step at how to get images from the PDF file using a few lines of C# code.

  1. Create a new project.
  2. Download the API as mentioned above or update to the latest API version.
  3. Add the following namespaces:
using System;
using System.Collections.Generic;
using System.Text;
using GroupDocs.Parser.Data;
view raw namespaces.cs hosted with ❤ by GitHub
  1. Load the PDF document using Parser class.
// Create an instance of Parser class
using (Parser parser = new Parser("path/document.pdf"))
{
// your code goes here.
}
view raw LoadDocument.cs hosted with ❤ by GitHub
  1. Extract images from the document using GetImages method.
// Extract images
IEnumerable<PageImageArea> images = parser.GetImages();
// Check if images extraction is supported
if (images == null)
{
Console.WriteLine("Images extraction isn't supported");
return;
}
  1. Access each image from the collection and save it using the Save method.
// Iterate over retrieved images
foreach (PageImageArea image in images)
{
// Save Images
image.Save("imageFilePath/image-" + imageNumber.ToString() + ".jpeg", new ImageOptions(ImageFormat.Jpeg));
imageNumber++;
}
view raw SaveImages.cs hosted with ❤ by GitHub

You can save the images in various different images like JPG, PNG, BMP, WebP, or GIF.

C# Complete Code – Image Extraction from PDF

Here is the complete code that will allow you to get all the images from a PDF file.

// Extract images from PDF using C#
using (Parser parser = new Parser("path/document.pdf"))
{
IEnumerable<PageImageArea> images = parser.GetImages();
// Check if image extraction is supported
if (images == null)
{
Console.WriteLine("Images extraction isn't supported");
return;
}
ImageOptions options = new ImageOptions(ImageFormat.Jpeg);
int imageNumber = 0;
// Iterate over retrieved images
foreach (PageImageArea image in images)
{
// Save Images
image.Save("imageFilePath/image-" + imageNumber.ToString() + ".jpeg", options);
imageNumber++;
}
}

Results

Sample PDF Document

PDF document having images to extract.

Extracted Images

extracted images from the PDF.

If you require, it is also explained in a separate article that how you can Extract Images from any Specific Page of a PDF Document using C#.

Read More

You can explore more about the .NET data extraction API using its documentation. Also, you can share your queries with us via our forum.

See Also