In the previous post, we discussed how to extract images from documents in Java. Today, we will be looking to achieve the same objective using C#. No worries if you have not visited the last post. In this article, we will be learning to programmatically extract images from PDF, Excel, PowerPoint, and Word documents in a C# application using document parsing .NET API.

Extract Images from Documents in .NET

Following topics will be covered here:

Image, Text, and Metadata Extraction .NET API

Parse Documents and Extract Data in .NET

GroupDocs.Parser for .NET is document parsing and data extraction .NET API. It supports document parsing and extraction of images, text, and metadata from word-processing documents, spreadsheets, presentations, archives, and email documents. At the end of the article, document formats are mentioned that are supported by the API for image extraction.

In this article, we will use this API, so I would recommend to download its binaries or install the API from NuGet to prepare the environment.

Extract Images from PDF Documents in C#

PDF Document to Extract Images

You can easily retrieve all the images from any PDF document by following these simple steps.

  1. Instantiate the Parser class object with the source document.
  2. Call GetImages method of Parser class to get the collection of all the images in PageImageArea objects.
  3. Iterate over PageImageArea to get every image.
  4. Save images on the disk using the Save method of PageImageArea.

Extracted images can be saved in BMP, GIF, JPEG, PNG, and WebP formats. The complete code is shown below to demonstrate the whole steps.

Extracted Images from Document using GroupDocs.Parser

Image Extraction from Word, Excel, PowerPoint Files in C#

Not restricted to just PDF format, we can take out all the images from word-processing documents, spreadsheets, presentations, with the unchanged code base. Just change the source document path with the file extension, your document will be parsed to extract and save all the images to the disk.

using (Parser parser = new Parser("path/document.docx")) // Word Document
// using (Parser parser = new Parser("path/document.xlsx")) // Excel Spreadhseet
// using (Parser parser = new Parser("path/document.pptx")) // Presentation
// using (Parser parser = new Parser("path/document.pdf")) // PDF Document

Extract Images from Specific Document Page in C#

If you want to extract images from a specific page of the document, it can be done easily using the below-mentioned steps and C# code.

  • Get the information about the document using the GetDocumentInfo method.
  • From the document information, take out the total PageCount and other information.
  • Use GetImages(pageIndex) method and pass your target page index to it.
  • To save the retrieved images, traverse the images collection, and save the individual image using the Save method.

Supported Formats for Image Extraction in C#

Following are the document formats that are supported by the GroupDocs.Parser for .NET API for image extraction.

Document Type File Formats
Word Processing Documents DOC, DOCX, DOCM, DOT, DOTX, DOTM, ODT, OTT, RTF
Spreadsheets XLS, XLSX, XLSM, XLSB, XLT, XLTX, XLTM, ODS, OTS, XLA, XLAM, NUMBERS
Presentations PPT, PPTX, PPTM, PPS, PPSX, PPSM, POT, POTX, POTM, ODP, OTP
Portable Documents PDF
Emails EML, EMLX, MSG
Archives ZIP

More about GroupDocs.Parser

Let’s talk some more @ Free Support Forum