Category Archive: GroupDocs.Parser Product Family
In this article, we will be learning to programmatically extract images from PDF, Excel, PowerPoint, and Word documents in a C# application using document parsing .NET API.
GroupDocs.Parser for .NET is document parsing and data extraction .NET API. It supports document parsing and extraction of images, text, and metadata from word-processing documents, spreadsheets, presentations, archives, and email documents.
Extracted images can be saved in BMP, GIF, JPEG, PNG, and WebP formats.
Today, we will learn to programmatically extract images from PDF, Excel, PowerPoint, and Word documents using Java. For the extraction of images, we will use GroupDocs.Parser for Java. This Java API supports the parsing of documents and extraction of images, text, and metadata from word-processing documents, spreadsheets, presentations, archives, and email documents. Extracted images can be saved in BMP, GIF, JPEG, PNG, and WebP formats.Following topics will be covered in this article:
- Image Extraction Java API
- Image Extraction from PDF documents in Java
- Extract Images from Word, Excel, PowerPoint documents in Java
- Extract Image from Specific Page in Java
The database is considered to be an integral part of most of the applications. Be it a desktop, web or mobile application, database plays a vital role in storing, accessing and manipulating the data. There are many database management systems that allow creating and managing databases for you.
However, there could be a scenario when you need a way to extract data from database files, i.e. .db file, without installing a database management system or writing the SQL queries. How … Continue Reading
Invoices and receipts are the documents that are used to record the transactions in a particular format when buying or selling of the services or goods is involved. Things have gone digital and with the popularity of online shopping, digital invoices are widely used. Processing a number of digital invoices and extracting the information manually is a complex as well as time taking process. Thus, you need a faster yet efficient way for such a case. So in this article, … Continue Reading
Repetition of data can diminish the worth of the content. Working as a writer, you must follow DRY (don’t repeat yourself) principle. The statistics such as word count or the number of occurrences of each word can let you analyze the content but it’s hard to do it manually for multiple documents. So in this article, I’ll demonstrate how to programmatically count words and the number of occurrences of each word in PDF, Word, Excel, PowerPoint, … Continue Reading
Portable Document Format (PDF) is a popular and widely used document format developed by Adobe. The PDF documents can contain a variety of content including formatted text, images, annotations, form fields, etc. Parsing PDF document programmatically is a popular use case and there are multiple ways of extracting the text. However, extracting images from a PDF document is a complex task. This article demonstrates how easily you can extract images from the PDF documents programmatically in C# using GroupDocs.Parser for … Continue Reading
The all-new API v2 of GroupDocs.Parser for .NET has been released! It would be a piece of breaking news for those who are already using our document parsing API as well as who are looking for an easy to use solution for extracting text, images, and metadata from PDF, word processing documents, spreadsheets, presentations, emails, EPUB & ZIP file formats.
What’s new in the API v2?
We have done some major updates at … Continue Reading
Hello everyone! I am back with something new and exciting for the developers who use to deal with the automated data extraction from the documents. A few years back, we released GroupDocs.Parser API which aimed to extract the text from various document formats. We kept on adding the features to it and today, it has become a giant API that provides a wide range of features including formatted text extraction, highlighted and structured text extraction, metadata extraction, extraction of images … Continue Reading