Portable Document Format (PDF) is a popular and widely used document format developed by Adobe. PDF documents can contain a variety of content including formatted text, images, annotations, form fields, etc. Parsing PDF documents programmatically is a popular use case and there are multiple ways of extracting the text. However, extracting images from a PDF document is a complex task. This article demonstrates how easily you can extract images from PDF documents programmatically in C#.
.NET API to Extract Images from PDF Files
GroupDocs.Parser for .NET API will play its part in the extraction of images from PDF files. Along with the PDF, the API supports the parsing, and extraction of images from word-processing documents, spreadsheets, eBooks, presentations, emails, ZIP archives, and many other document formats.
PM> Install-Package GroupDocs.Parser
Steps to Extract Images from a PDF document using C#
Let’s quickly look step by step at how to get images from the PDF file using a few lines of C# code.
- Create a new project.
- Download the API as mentioned above or update to the latest API version.
- Add the following namespaces:
- Load the PDF document using Parser class.
- Extract images from the document using GetImages method.
- Access each image from the collection and save it using the Save method.
You can save the images in various different images like JPG, PNG, BMP, WebP, or GIF.
C# Complete Code – Image Extraction from PDF
Here is the complete code that will allow you to get all the images from a PDF file.
Sample PDF Document
If you require, it is also explained in a separate article that how you can Extract Images from any Specific Page of a PDF Document using C#.