Extracting pages from a document and saving them as a new file in C#

The first and foremost question in your mind could be, is it about PDF documents only? And the answer is, No. In this blog post we will see how simple yet helpful it is to extract pages from different kinds of documents (e.g. Word, Excel, Presentation, PDF, HTML, RTF) and save them as a new file using GroupDocs.Merger for .NET. Learn more about the supported file formats. Hence, the resultant document will possess only extracted pages.

Is there any software installation needed?
GroupDocs.Merger for .NET is a back-end API that can be integrated in any existing or new .NET application (e.g. ASP.NET, Windows form, Console). It doesn’t matter if MS Office or any PDF reader is installed on your computer or not. API doesn’t rely on any third party tool or software.

How simple is it?
The following code sample demonstrates how to extract document pages by specifying exact page numbers:

The following code sample demonstrates how to extract document pages by specifying page numbers range:

Below is the source PDF with 13 pages. Suppose we want to extract two pages only (1, 3) and create new PDF.

Output PDF will look like this:

You can download API here. If you face any issue, post it on forum.

Share on FacebookTweet about this on TwitterShare on LinkedIn
Posted in GroupDocs.Merger Product Family | Tagged , , , | Leave a comment

Introducing a More Optimized and Simplified GroupDocs.Watermark for .NET API

It’s been quite long since we released the last version of GroupDocs.Watermark for .NET API. The reason behind this gap was to introduce a more simplified and optimized watermark manipulation API for the .NET platform. Today, I am excited to announce that the API v2 of GroupDocs.Watermark for .NET has been released as v19.10 and it is available for download.

What’s new in the latest release?

The major updates have been done at the architecture level of the API to simplify its usage. Furthermore, we have performed major product optimization in the codebase and introduced unified classes to deal with watermarking operations for all the supported document formats. Some key reasons for these updates are listed below:

  • Watermarker class is introduced as a single entry point to manage watermarks in the document (instead of Document class from previous versions).
  • Adding watermarks has been unified for all supported document formats.
  • The product architecture has been redesigned from scratch in order to simplify the usage of different options to manage the watermarks.
  • Document information and preview generation procedures have been simplified.

How to migrate?

The legacy API has been moved into Legacy namespace so you’ll have to make a project-wide replacement of namespaces from GroupDocs.Watermark to GroupDocs.Watermark.Legacy to resolve build issues after upgrading to v19.10. Furthermore, below is a code comparison of how to use the basic features of the API using the old and the latest version.

Adding Watermark

The following code samples give you a comparison of how to add watermark to the document using old and new API.

Old API

New API

Searching Watermarks

The following code samples show you the comparison of finding watermarks using search criteria.

Old API

New API

Removing Watermarks

The following code samples demonstrate the comparison of removing all possible watermarks.

Old API

New API

Getting Document Info

The following code samples show how to get document information from the local file.

Old API

New API

For more details, please have a look at the migration notes. Visit the release notes of v19.10 to see all the changes in the public API. You can download or clone the examples project from GitHub to evaluate each feature of the API. In case you would have any question or querie, you can raise it via our forum.

Share on FacebookTweet about this on TwitterShare on LinkedIn
Posted in GroupDocs.Watermark Product Family | Tagged , | Leave a comment

Count Words and Occurrences of Each Word in a Document using C#

Repetition of data can diminish the worth of the content. Working as a writer, you must follow DRY (don’t repeat yourself) principle. The statistics such as word count or the number of occurrences of each word can let you analyze the content but it’s hard to do it manually for multiple documents. So in this article, I’ll demonstrate how to programmatically count words and the number of occurrences of each word in PDF, Word, Excel, PowerPoint, Ebook, Markup, and Email document formats using C#. For extracting text from documents, I’ll be using GroupDocs.Parser for .NET which is a powerful document parsing API.

Steps to count words and their occurrences in C#

1. Create a new project.

2. Install GroupDocs.Parser for .NET using NuGet Package Manager.

3. Add the following namespaces.

4. Create an instance of the Parser class and load the document.

5. Extract the text from the document into a TextReader object using Parser.GetText() method.

6. Split up the text into words, save them into a string array and perform word count.

7. Order the words by their occurrence count and display the results.

Complete Code

Results

Read more about GroupDocs.Parser for .NET API here. Leave your questions or queries on our forum.

Share on FacebookTweet about this on TwitterShare on LinkedIn
Posted in GroupDocs.Parser Product Family | Tagged | Leave a comment

Annotate documents programmatically using GroupDocs.Annotation for .NET

How about annotating a document programmatically? GroupDocs.Annotation for .NET offers a wide range of annotations (e.g. graphical, watermark, redaction, underline, etc.) that can be added in any supported document without third party software/tool installation.

If you are already using GroupDocs.Annotation for .NET. You can migrate to version 19.9 because:

  • Annotator class introduced as a single entry point to manage the document annotating process to any supported file format
  • The overall rendering speed improved dramatically by saving rendered page as soon as it was rendered, not when all pages were rendered
  • Document saving options simplified so it’s easy to instantiate proper options class and control over document annotating and saving processes

Let’s see the code difference. Below is the old code style.

Below is the new code style.

The legacy API have been moved into Legacy namespace so after update to this version it is required to make project-wide replacement of namespace usages from GroupDocs.Annotation to GroupDocs.Annotation.Legacy to resolve build issues.

You can download API here and post your concerns on forum.



Share on FacebookTweet about this on TwitterShare on LinkedIn
Posted in GroupDocs.Annotation Product Family | Tagged , , | Leave a comment

Convert Spreadsheets using GroupDocs.Conversion for .NET and Java

Reason to use a Document Conversion API

The world is becoming a global village and businesses running all over the world interacting and collaborating with hundreds of institutions across the globe, the data gathered from different sources come in a number of different formats. Even the data arranged within an organization could be compiled in different formats depending upon the person or department gathering and maintaining it. You may also find yourself in a situation where the older files being used within the company may no longer be compatible with the company’s needs due to certain changes in company policies and software being used by it.

For handling such scenarios and to fuel information governance and digital transformation initiatives, organizations must extract more valuable information and business insights from those documents which have either become incompatible or had already been in some format not supported by their systems. However, before an enterprise can leverage the information they already have, they must transform their unconsolidated content into a universal, readily accessible format and in doing so, they also need to make sure that the valuable data present in actual files may not get lost while making those conversions.

GroupDocs.Conversion for .NET and Java

GroupDocs.Conversion released both for .NET and Java makes this conversion of documents extremely easy and the automatic conversion saves the time and effort to rewrite the document in the desired format. It provides the fastest conversion between a variety of formats keeping the safety and quality of the documents in check. There are numerous features added to the API, the most common of which include the option of converting the single page of the document in the desired format or process the whole document and apply watermark at the same time on the complete converted documents.

Skip Empty Rows and Columns

Here today, we will discuss another important feature of the API which is extremely helpful while converting documents from the Spreadsheet format. By using this feature, you can optionally hide the empty rows and columns while converting the spreadsheet documents. Following are the simple steps for using this feature:

  • Set “SkipEmptyRowsAndColumns” property against the API
  • Load your desired spreadsheet document
  • Convert the document into your desired format

Following example demonstrates how this feature can be used when converting to PDF using GroupDocs.Conversion for .NET:

Java developers can use the following code:

A converted document will look like the following screenshot:

Share on FacebookTweet about this on TwitterShare on LinkedIn
Posted in GroupDocs.Conversion Product Family | Tagged , , , , , , , , , , , | Leave a comment

Find and Remove Watermarks from Documents in Java

This article is useful for the Java developers who are looking for a way to find and remove text or image watermarks from PDF, Word, Excel, PowerPoint, Visio and Email documents.

GroupDocs.Watermark for Java API supports adding text and image watermarks to a wide range of document formats. In addition, it also has the ability to find and remove watermarks from the documents. The API also finds the watermark objects that are added using the third-party tools. So let me demonstrate how you can remove the watermark from a document in a few steps in Java.

Before we begin, have a look at the following PDF document which contains a text as well as an image watermark. We’ll use this document and remove the watermarks from it.

Steps to remove watermarks from a document

1. Create a new project.

2. Add the following imports.

3. Create an instance of Document class and load the source document.

4. Find the watermarks based on search criteria using findWatermarks method (if you don’t pass any searching criteria, findWatermark will return all the possible watermark objects).

5. Iterate over the watermark collection and remove watermarks using removeAt method.

6. Save the resultant document using save method.

Complete Code

Results

The following is the screenshot of the resultant PDF document that we get after removing the watermarks.

Explore more about GroupDocs.Watermark for Java API here. In case of any queries, reach us at our forum.

Share on FacebookTweet about this on TwitterShare on LinkedIn
Posted in GroupDocs.Watermark Product Family | Tagged | Leave a comment

Index each letter as a separate word using GroupDocs.Search for .NET

Are you looking for a full-text search API that allows you to search over a lot of document formats? In that case, GroupDocs.Search for .NET will meet your requirements. API creates index and then perform instant search across thousands of documents.

Those who are already working with the API, we have some new features and improvements. Moreover, some classes have been renamed to improve code readability. There are minor changes in the new version 19.10, so the migration will not be too difficult. API architecture is optimized for better performance.
After upgrading to v19.10, you need to replace the namespace usage across the entire project from GroupDocs.Search to GroupDocs.Search.Legacy to resolve build issues.

Lets go though the code changes:
Old code sample:

New code snippet:

You can observe the minor changes (e.g. SearchParameters is changed to
SearchOptions).

Improvements

  • Highlight search results in short fragments
  • Enhance document metadata indexing with new formats

New Features

  • Index each letter as a separate word
  • Implemented ability to remove paths from index

How to highlight search results in short fragments?
This improvement allows highlighting the search results in separate short fragments of the text, and not in the whole document. Below example shows how to generate short HTML snippets with highlighted found terms:

How to enhance document metadata indexing with new formats?
This improvement adds support for new document formats. These are mostly documents, the main content of which is not textual, therefore only the metadata of these documents is indexed:

  • MP3 – MPEG-2 Audio Layer III;
  • WAV – Waveform Audio File Format;
  • BMP – Bitmap Picture;
  • GIF – Graphical Interchange Format File;
  • JP2 – JPEG 2000 Core Image File;

For complete list visit this article.

How to index each letter as a separate word?
This feature is designed to work with hieroglyphic languages and allows you to index each character in the text as a separate word, regardless of the presence of separators.

Ability to remove paths from index
When indexed paths are removed from an index, the index is updated and all removed documents and folders become inaccessible for search.


We’d recommend you to download the latest version and share your experience. In case of any issues, you can post on forum.



Share on FacebookTweet about this on TwitterShare on LinkedIn
Posted in GroupDocs.Search Product Family | Tagged | Leave a comment

Add watermark to the converted document in C# or Java

Document conversion is one of the most frequent process that endures across a lot of industries. Sometimes, its the business need to put a watermark on the resultant document. For example, you want to convert a PPTX to PDF with a watermark (text or image) in all the PDF pages.

GroupDocs.Conversion for .NET gives you such a option. It possesses a class WatermarkOptions with rich properties such as:

  • Text/Font
  • Color
  • Width
  • Height
  • Background
  • Transparency
  • Rotation angle

Lets have a look at its implementation in C#:

Document conversion in Java:



Below is the screenshot, you can see conversion of a PPTX to PDF along with watermark text.

Download API here. If there’s any issue, you can post on forum.

Share on FacebookTweet about this on TwitterShare on LinkedIn
Posted in GroupDocs.Conversion Product Family | Tagged | Leave a comment

Introducing Z-Order for Text Signatures in GroupDocs.Signature for .NET 19.9

We are back with another monthly release of our eSign APIGroupDocs.Signature for .NET. This release includes three new features, three improvements, and a bug fix. So in this article, I’ll give you a walk-through of the latest release.

Support of Z-Order for Text Signatures

Z-order determines the order for the overlapping, two-dimensional objects in an x-y plane. Put it simply, if X-axis represents the width and Y-axis represents the height then Z-order will represent the depth of the object. An object with a higher Z-order value will appear in front of all the other overlapping objects. This option has now been added to our eSignature API for the text signatures. For now, this property is supported for the following document formats:

  • PDF documents
  • Word processing documents
  • Spreadsheets

We have added the ZOrder property in the TextSignOptions class and this property is responsible for ordering the overlapping text signatures. Let’s understand the usage of this property with the help of a code sample.

Improvements

The following is the list of improvements we have made in v19.9.

  • Improved the feature of saving signed .djvu files
  • Improved saving word processing documents to various formats
  • Implemented the transparent background for stamps

Bug Fix

In the previous versions, the Signature property of DigitalSignOptions class was not affecting the signing process. An exception was thrown if CertificateStream or CertificateFilePath is not set. This issue has now been resolved and its fix is available in v19.9.

You can download the latest release from our downloads section or install it in your project using NuGet. For more details, consult the product documentation. Let us know your suggestions or concerns via our forum.

Share on FacebookTweet about this on TwitterShare on LinkedIn
Posted in GroupDocs.Signature Product Family | Tagged | Leave a comment

Adjust contrast when converting a document to image

Up til now there was a simple document to image conversion process. With the release of GroupDocs.Conversion for Java 19.10, we’ve added a number of interesting features for image conversion.

  • Set color mode when converting to JPEG
  • Option to set compression mode
  • Adjust image brightness and contrast
  • Set gamma
  • Option to flip image

Have a look at its implementation:

Conversion from CDR
CDR is a vector graphics file. API now allows you to do conversion from CDR.

Improvements
Along with new features, there are some improvements and bug fixes as well. Following are the major improvements:

  • Conversion from Excel95/5.0 XLS files
  • Set image quality when converting to WebP
  • Remove HideComments from SaveOptions

Bug Fixes
Previously, there was an issue (Exception: Cannot open an image) in conversion of Cells to Image. We’ve fixed it.
API conflict with Ch.qos.logback is also resolved.
Moreover, you can now generate scalable/Adjustable HTML. There is a property in HtmlSaveOptions known as FixedLayout. Try to set it to false. This will generate simplified markup and layout.

We’d recommend you to download and integrate API version 19.10 and enhance your document conversion experience. If you face any issue, post it on forum.

Share on FacebookTweet about this on TwitterShare on LinkedIn
Posted in GroupDocs.Conversion Product Family | Tagged | Leave a comment