Split or Merge PDF, Word, Excel Documents in C# or Java

Worried about merge or split documents of various types in multiple platforms? There could be many statements in your mind:

  • How to merge PDF documents together in Java?
  • Want to split word documents, or merge excel spreadsheets in CSharp.
  • What to do if I need to merge PPT/PPTX presentations.
  • Many more questions, the list may not end.
Split or Merge PDF, Word, Excel documents in Java

GroupDocs provides a document merging solution for all such requirements. GroupDocs.Merger allows you to merge documents and manipulate document structure across a wide range of supported document formats. It further allows manipulating document pages, page transformations, information extraction from the documents, generating previews and much more. The merger solution is not confined to native APIs, so it includes;

Merge with GroupDocs.Merger On-Premise APIs

In this article, we will only look a bit about the native APIs of GroupDocs.Merger which are currently available for .NET & Java.

Merge Documents in Java

GroupDocs.Merger for Java

GroupDocs.Merger is a simple way to merge two or more documents of the same format. Below is the code snippet from the examples, that is self-explanatory and needs no further clarification, hence shows how to merge PDF documents using the Java API.

Steps are simple if you have decided the documents to combine together:

  • Instantiate Merger object, having the first document with which other documents to be merged.
  • Call join method, passing the document to merge. Recall join method to merge more documents.
  • Call save method to save the final output.
  • That’s it.
// Set paths for the documents to join together in a single file.
String filePath1 = "document-1.pdf";
String filePath2 = "document-2.pdf";
String filePath3 = "document-3.pdf";
// Create Merger object containing the first document to be merged with other documents of the same format.
Merger merger = new Merger(filePath1 );
// Join Documents.
merger.join(filePath2 );
merger.join(filePath3 );
// Save the merged document.
String filePathOutput = "mergedDocument.pdf";
merger.save(filePathOutput);

Merger API is not about just merge PDF documents, there is a large list of supported document file formats that can be merged using the document merger API. Your requirement could be any; merge word processing formats like DOC/DOCX, merge XLS/XLSX Excel spreadsheets, PPT/PPTX presentations, Visio Drawings, Web formats, Page Description Language formats, or eBooks. GroupDocs.Merger supports your format. Here is the list of supported file formats. Visit docs of GroupDocs to stay updated.

Document TypeFile Formats
Word ProcessingDOC, DOCX, DOCM, DOT, DOTX, DOTM, ODT, OTT, RTF, TXT
SpreadsheetsXLS, XLSX, XLSM, XLSB, XLT, XLTX, XLTM, ODS, CSV, TSV
PresentationsPPT, PPTX, PPS, PPSX, ODP, OTP
DrawingsVSDX, VSDM, VSSX, VSSM, VSTX, VSTM, VDX, VSX, VTX
WebHTML, MHT
Page Description LanguagesTEX, XPS
eBooks & OthersPDF, EPUB, ONE

Merge Document Pages in C#

GroupDocs.Merger for .NET

The Merger API allows multiple documents to get merged with selective pages and also by specifying the desired page range. Your code will remain similar to the mentioned above, just a little change while setting your JoinOptions.

Below is the source code snippet from the examples that shows how to merge documents by specifying certain pages using the GroupDocs.Merger for .NET API.

// Set paths for the documents to join together in a single file.
string filePath1 = @"document-1.docx";
string filePath2 = @"document-2.docx";
string filePathOutput = @"mergedDocument.docx";
// Create join options.
JoinOptions joinOptions = new JoinOptions(1, 4, RangeMode.OddPages);
// Join the second document and save.
using (Merger merger = new Merger(filePath1, loadOptions))
{
    merger.Join(filePath2, joinOptions);
    merger.Save(filePathOutput);
}

GroupDocs.Merger is not limited to just merge documents. It supports a large list of features for the supported file formats. Here are some of the features to get an idea about the APIs:

Split Document into Multiple Documents

There are many ways in which your requirement could be, how to split a document.

  • Split by exact page numbers
  • Split by page range
  • Split by Even and Odd pages
  • Split a document to several multi-page documents

All of the above are offered by the GroupDocs.Merger API. Below code-example shows how to split a word document by providing the exact number of pages using the Java API:

String filePath = "document.docx";
String filePathOut = "document_{0}.{1}";
// Split the document into multiple single page documents.
PageSplitOptions splitOptions = new PageSplitOptions(filePathOut, new int[] { 3, 6, 8 });
Merger merger = new Merger(filePath);
merger.split(splitOptions);

Good to see you here, you can freely contact us on the forum in case you feel any difficulty or have some confusion or want to give some good suggestions.

Related Links

Posted in GroupDocs.Merger Product Family | Tagged , , , , , , | Leave a comment

Convert WebP to PNG, JPG, and PDF in Java

WebP is the image format introduced by Google that provides lossless and lossy compression for images on the web. WebP images are smaller in size as compared to the well known and vastly used image formats like PNG and JPG, hence provides faster web experience.

Despite the fact that WebP images give transparency like PNG, animate like GIF, and the most important for any web developer is the smaller size than comparative quality JPG format, it isn’t still universally compatible. This incomplete support and compatibility issue sometimes force developers to convert the WebP image into PNG, JPG or other formats.

Convert WebP image to JPG, PNG or PDF formats.

GroupDocs provides the solution to convert 50+ document and image file formats. As a developer, you can use GroupDocs.Conversion on-premise and cloud APIs to convert WebP images in your Java, .NET and many other supported programming languages based applications. As a normal user, you can use GroupDocs.Conversion App to get your WebP image files converted.

Convert WebP to PNG format

While using GroupDocs.Conversion API, you can get the possible conversion formats of the source document by using the getPossibleConversions() method of Class ConversionHandler. You can either pass the source document as an InputStream or just pass the file extension of the source document to get the possible conversion formats.

Below source code shows how easily you can now convert the WebP image to PNG format. For the conversion of WebP file to some other supported format, you just have to change the output format of the image by setting the appropriate ImageFileType. For instance, to convert WebP to JPG, just change the ImageFileType from PNG to JPG.

ConversionHandler conversionHandler = new ConversionHandler(Utilities.getConfiguration());
// Create and set Image Saving Options
SaveOptions saveOption = new ImageSaveOptions();
saveOption.setConvertFileType(ImageSaveOptions.ImageFileType.PNG);
// Convert the WebP image to PNG or JPG format
String fileName = "image.webp";
ConvertedDocument convertedDocumentPath = conversionHandler.convert(fileName, saveOption);
SaveInfo saveInfo = convertedDocumentPath.save(fileName + "." + convertedDocumentPath.getFileType());

Convert WebP to PDF in Java

WebP image can not just only be converted into any other image file format, however, GroupDocs.Conversion API allows conversion into many document file formats. The following example shows how a Java developer can quickly convert WebP image into PDF (Portable Document Format.)

ConversionHandler conversionHandler = new ConversionHandler(Utilities.getConfiguration());
// Create and set PDF Save Options
PdfSaveOptions saveOption = new PdfSaveOptions();
saveOption.getPdfOptions().getFormatingOptions().setPageLayout(PdfFormattingOptions.PdfPageLayout.SinglePage);
// Convert the source WebP image to PDF document.
String sourceFileName = "image.webp";
ConvertedDocument convertedDocumentPath = conversionHandler.convert(sourceFileName, saveOption);
SaveInfo saveInfo = convertedDocumentPath.save(sourceFileName + "." + convertedDocumentPath.getFileType());

There are many other open-source examples that are publicly available at GitHub Repository. Download the source code and quickly run the examples using the getting started guide. In case of any difficulty, look at the documentation or reach us at any time on the forum.

Have a nice coding day!

Posted in GroupDocs.Conversion Product Family | Tagged , , , , | Leave a comment

GroupDocs.Total Discount Offer ends January 31st

LinkedIn Google+ Twitter Facebook
Share this issue:

Monthly Newsletter

January 2020

25% off Conholdate.Total
Hurry! Offer ends January 31st.
 

Get 25% off GroupDocs.Total for .NET and Java. Quote HOLOFF2019 when placing your order.

 
Buy Now
 

This offer is only available on new GroupDocs.Total purchases and cannot be used in conjunction with other offers, renewals or upgrades. Only available directly from groupdocs.com, not through third parties or resellers. Ts&Cs Apply.

Product News
Product News
Product News
 
From the Library
From the Library
From the Library
 
Feedback
Feedback
Feedback
 
GroupDocs for .NETGroupDocs for JavaGroupDocs for Cloud APIs
 
Product Releases and Updates
Posted in Customer Newsletters | Tagged , , , | Leave a comment

Important Bug Fixes in GroupDocs.Viewer for .NET 19.11

asp.net document viewer API

We have rolled out another update for GroupDocs.Viewer for .NET featuring some important bug fixes as well as an improvement related to the MSI package. This release hasn’t brought any new feature, still, it has addressed some important issues related to PDF, DWG and ODG file formats. Furthermore, a few compatibility issues which appeared under .NET Standard 2.0 have been resolved. So let’s have a brief overview of the bug fixes and improvements we have introduced in v19.11.

Issue: Rendering DWG to image (PNG/JPG) or PDF resulted in an empty output

This issue appeared for some specific DWG files when the contents of the source files were missing in the output and it resulted in blank/empty images or PDF documents.

Issue: The code hangs when rendering PDF document to HTML

One of our customers faced an issue where the API was taking too long to render a particular PDF document into HTML. We have resolved this issue and improved the performance of the API when rendering such PDF documents.

Issue: Console output is printed when rendering ODG images

In the previous versions, the unnecessary messages were printed in the console window while rendering the ODG images. Although it wasn’t affecting API’s functionality or the output, it might have created confusion for the developers. We have fixed this issue to prevent unexpected messages to be printed in the console window.

Issue: Compatibility issues under .NET Standard 2.0

In the previous release, we added the support of .NET Standard 2.0 for cross-platform development using GroupDocs.Viewer for .NET. This enhancement raised some internal compatibility issues, however, we have fixed these issues in v19.11.

Improvement: New ProjectGuid and UpgradeCode for MSI package

We have updated the unique identifier that is used by OS to identify the application installed with an MSI package. This update would require you to manually uninstall the previous version of GropuDocs.Viewer for .NET before installing v19.11 using the MSI package.

Since the updates are always important, we would recommend you to upgrade to v19.11 in your applications. In case you would face any issue or have any confusion, feel free to share with us via our forum.

Posted in GroupDocs.Viewer Product Family | Tagged , , , , , , , , , , | Leave a comment

Classify text using IAB-2 or Document taxonomies in C#

A taxonomy or classification is basically an approach in which text is systematically identified and then organized. When you are dealing with a bunch of data (text based or documents), it becomes hard to find a topic of your need until and unless this data is classified or organized. Hence, you have to classify text in order to fetch data/information quickly.

GroupDocs.Classification for .NET

GroupDocs offers a programmable document or text classification API for .NET developers. You just have to add a single DLL (GroupDocs.Classification for .NET) as a reference in your .NET project. API allows developers to use two different taxonomies: IAB-2 (Interactive Advertising Bureau) and documents taxonomy.

IAB-2 text classification

IAB-2 categories texts into multiple topics and then identifies text based on the depth level. Call Classify method with a text as parameter to perform classification.

This text will be classified as Healthy_Living (IAB-2). Some more examples:

  • Sooner or later technology will overcome labor work – Technology_&_Computing (IAB-2)
  • This game has better graphics on Xiaomi Note 8 pro mobile – Video_Gaming (IAB-2)
  • We need groceries for the next month – Shopping (IAB-2)

Document taxonomy

Documents taxonomy is used to identify different document classes, such as Invoices, CVs, Forms, emails. Call Classify method for “document.pdf” file in the current directory with IAB-2 taxonomy and return 2 best results.

Call Classify method for “document.doc” file with Documents taxonomy, set precision/recall balance to “Precision” and return 4 best results.

API also facilitates classification of password-protected documents.

Below are some helpful resource for you

We’d recommend you to explore these resources, evaluate API and if there is any issue, you can raise it on forum.

Posted in GroupDocs.Classification Product Family | Tagged , , , | Leave a comment

View Contents of ZIP and TAR Archives using GroupDocs.Viewer for Java 19.11

Java ZIP TAR Viewer

We are excited to bring a major release of GroupDocs.Viewer for Java API packaging a bunch of new features, improvements, and bug fixes. In the latest release, we have added the support of viewing archives and a couple of code files as well as provided the features of working with security settings in the PDF documents. So let’s walk through the latest release of our document viewer API for Java and check out what you are going to get after upgrading to v19.11.

View ZIP and TAR Archives

The first and foremost feature of v19.11 is viewing the list of files and folders in the ZIP and TAR archives. This feature is quite handy when you want to view the list of the contents without extracting the archives.

ZIP file is used to encase multiple files or folders as a single package that is further compressed to reduce the file size. Similarly, TAR is a Unix File Archive format used to archive the files and folders. In general, both ZIP and TAR are categorized as compression file formats.

In the following sections, you will see how to view a list of contents from the ZIP or TAR archives without extracting.

View List of Contents in ZIP or TAR Archives

When rendering an archive file as HTML, GroupDocs.Viewer returns an HTML page containing the list of items that are at the root of the archive. In the case of rendering as image or PDF, the API returns one or more pages depending on the number of items. The following code sample demonstrates this feature.

View List of Folders from ZIP or TAR

ZIP or TAR archives may contain multiple files and folders. These folders may further contain files as well as subfolders. GroupDocs.Viewer also allows viewing the folders that are located at the root of the archive. The following code sample shows how to get a list of folders from a ZIP or TAR archive.

View List of Subfolders within a Certain Folder of ZIP or TAR

There might be the case when you need to obtain the list of subfolders within a root folder in the ZIP or TAR archive. For such a case, you can specify the folder name using ArchiveOptions.setFolderName(“FolderName”) and the API will return the list of subfolders.

View List of Files within a Folder in ZIP or TAR

Now, once you have got the list of folders (and the subfolders as well), you can extract and view the items from your desired folder. The following code sample shows how to view the items of a specific folder in a ZIP or TAR.

For more details on rendering archives, please visit working with archives.

Working with Security Settings in PDF Documents

The PDF documents allow setting security parameters to restrict unauthorized access. The security can be enabled using:

  • Owner password – The password which is required to change document permissions.
  • User password – The password required to open the document.
  • PDF file permissions – The permissions to allow or deny printing, modification and data extraction.

In the latest release, we have added the feature of setting the above-mentioned security settings when rendering a file into a PDF document. The following code sample shows how to set the owner password, user password, and the permissions to deny the printing.

Detecting Security Settings in PDF Document

You can also check the security settings that are applied to a particular PDF document. For example, you can check if printing of the document is allowed or not as shown in the following code sample.

Support for Viewing Code Files

In addition to the support of ZIP and TAR files, we have also added the feature of viewing C# (.cs) and Visual Basic (.vb) code files.

Bug Fixes

The following is the list of bugs that are fixed in v19.11.

  • Output extension is empty when saving HTML page into cache
  • Object null reference exception when rendering DWG document
  • The Watermark opacity is set twice when rendering as HTML
  • The separator is wrong for the opacity value
  • File is corrupted or damaged exception for presentation documents
  • Unable to render .xls file with exception “file is corrupted or damaged”

Improvement

We have made the following improvements in v19.11.

  • Improved performance for rendering PSD image format into PDF
  • Improved rendering Dicom, Dng and WebP formats into PDF
  • Extended support for CellsOptions.setTextOverflowMode option for rendering the document into image
  • Extended support for CellsOptions.setTextOverflowMode option for rendering into PDF
  • Rendering contact photo from vCard file format (VCF)
  • Improve output for rendering zip archives

Well, this was a brief overview of the major features as well as improvements and bug fixes. You can also visit the release notes of GroupDocs.Viewer for Java 19.11 to know about the public API changes. Visit the documentation of GroupDocs.Viewer for Java for more details and code samples of every feature. You can download/clone the source code examples from the GitHub repository.

In case you find any issue while using our API, we are always available to provide you free support on our forum.

Posted in GroupDocs.Viewer Product Family | Tagged , , , , , , , , | Leave a comment

Convert EML or MSG file to PDF in C#

Do you want to build a web based, console or desktop application in C# that can convert an email file to PDF?

GroupDocs.Conversion for .NET is a one-stop solution for such a scenario. This API can be implemented in any of your (new or existing) .NET project without any dependency. Supported email formats are

  • MSG
  • EML
  • EMLX

API Usage and Implementation

You have to add API reference in your project either by downloading the DLL or by installing NuGet package. Now, few lines of code and that’s it.

You will get all details in the output PDF like recipients, subject or any image attachment.

Have a look at the converted PDF file

Explore our open source GitHub example project and documentation. If you still face any issue, post it on forum.

Posted in GroupDocs.Conversion Product Family | Tagged , , , | Leave a comment

How to Redact in Word Processing Documents Using C# or Java Programming

How to redact in Word – The redacted meaning is extensively searched over the internet now a days . C# or Java programming developers might want to upgrade their editing apps by adding a feature to redact in Word, redact PDF or they might need some redacted text in many kinds of documents. If you are a developer and want to enable text redaction in your app , the stuff below is definitely for you.

Why redacted information in the documents is remarkable

how to redact in word

A redacted document contain touchy data is vital to each person, extending from a sole owner to government offices. Nobody likes it when their private or secret data fall into some inappropriate hands. This routine is likewise regular among legal counselors. Nevertheless, how to redact in Word documents is not so simple. The vast majority have committed errors previously in this regard. Utilizing incorrect strategies to redact your data will prompt the spillage of your private information. Many people are using wrong techniques like erasing, changing the text color to white or utilizing obsolete apps to make redacted text in their documents.

The following section shows that how Java orC# Programming developers can do it using their coding editors.

Text redaction in your word documents using C# or Java APIs

Using GroupDocs.Redaction premise APIs you can simply write a program to redact the information in variety of documents or you might want to add text redaction feature in your editing apps. If you are going to make a
redaction software, you might be interested in metdata redactions, image redactions, annotation redactions,or spreadsheet redactions.

Lets learn how simply the C# or Java Programming developers can write the code to redact in Word documents

The GroupDocs.Redaction APIs enable you the text redaction in two ways”

  1. Exact phrase redaction
  2. Using regular expression

Using exact phrase redaction

The example below shows textual redaction by replacing personal exact phrase “John Doe” with “[personal]”:

using (Redactor redactor = new Redactor(@"sample.docx"))
{
  redactor.Apply(new ExactPhraseRedaction("John Doe", new ReplacementOptions("[personal]")));
  redactor.Save();
}

Java code will look like this:

final Redactor redactor = new Redactor("sample.docx");
try
{
    redactor.apply(new ExactPhraseRedaction("John Doe", new ReplacementOptions("[personal]")));
    redactor.save();
}
finally { redactor.close(); }

For a case sensitive redaction, you can pass a constructor parameter and corresponding public property like:

using (Redactor redactor = new Redactor(@"sample.docx"))
{
  redactor.Apply(new ExactPhraseRedaction("John Doe", true /*isCaseSensitive*/, new ReplacementOptions("[personal]")));
  redactor.Save();
}

Java code will look like this:

final Redactor redactor = new Redactor("sample.docx");
try
{
    redactor.apply(new ExactPhraseRedaction("John Doe", true /*isCaseSensitive*/, new ReplacementOptions("[personal]")));
    redactor.save();
}
finally { redactor.close(); }

If you need a color box over the redacted text, you can use color instead of replacement string. The redaction will erase matched text and put a rectangle of the specified color in place of redacted text:

using (Redactor redactor = new Redactor(@"sample.docx"))
{
  redactor.Apply(new ExactPhraseRedaction("John Doe", new ReplacementOptions(System.Drawing.Color.Black)));
  redactor.Save();
}

Java code will look like this:

final Redactor redactor = new Redactor("sample.docx");
try
{
    redactor.apply(new ExactPhraseRedaction("John Doe", new ReplacementOptions(java.awt.Color.RED)));
    redactor.save();
}
finally { redactor.close(); }

Using regular expression

The example below shows the redact out any text, matching “2 digits, space or nothing, 2 digits, again space and 6 digits” with a blue color box:

using (Redactor redactor = new Redactor(@"sample.docx"))
{
  redactor.Apply(new RegexRedaction("\\d{2}\\s*\\d{2}[^\\d]*\\d{6}", new ReplacementOptions(System.Drawing.Color.Blue)));
  redactor.Save();
}

Java code will look like this:

final Redactor redactor = new Redactor("sample.docx");
try
{
    redactor.apply(new RegexRedaction("\\d{2}\\s*\\d{2}[^\\d]*\\d{6}", new ReplacementOptions(java.awt.Color.BLUE)));
    SaveOptions saveOptions = new SaveOptions();
    saveOptions.setAddSuffix(true);
    saveOptions.setRasterizeToPDF(false);
    redactor.save(saveOptions);
}
finally { redactor.close(); }

The complete ready to run code sample is available on GitHub. You can explore API references for both Java and .NET versions. The GroupDocs also offers a one month free trial for both Java and .NET APIs. Please visit here to get the trial license.

Posted in GroupDocs.Redaction Product Family | Tagged , , , , , , , , , , , , | Leave a comment

Monitor document conversion status and progress in C#

With the release of GroupDocs.Conversion for .NET 19.11 you can now monitor document conversion progress. There is one improvement and a few bug fixes introduced.

A new property Listener is added. The document converter listener implementation is used for monitoring conversion status and progress. Have a look at ConverterListener class that implements IConverterListener interface

Below is the usage

Coming to the improvement, MPP to XLS conversion is improved. Previously, there was issue in PNG image conversion (e.g. PNG to Word or Presentation). This issue is now fixed. There was another document conversion issue, a particular Word file to PDF with exception: Could not create the bitmap with the specified parameters. This bug is also resolved.

Get latest version from download section. If you want to evaluate API features, we have an open-source example project for you that could be downloaded from GitHub.

Posted in GroupDocs.Conversion Product Family | Tagged , , , | Leave a comment

Redact Content in Apple’s Numbers Spreadsheet using GroupDocs.Redaction for Java 19.11

Java API to redact text in word pdf excel powerpoint

The technology is growing at a huge pace and to stand strong through this storm you need to improve every day. Accordingly, to make you enhance your applications and take them to the next level, we keep trying to meet your emerging requirements by introducing new features and improving the existing ones. This is the reason we have introduced an optimized and simplified version of our Java document sanitization and text redaction API – GroupDocs.Redaction for Java.

The v19.11 of GroupDocs.Redaction for Java has been released with a new public API and a couple of enhancements. So let’s have a look at the enhancements and changes we have done in this version.

Support of Numbers Spreadsheets

Numbers is Apple’s application to create and view the spreadsheet documents in iOS or macOS. The spreadsheets created within this application are stored with .numbers extension and they are quite similar to the other spreadsheets, for example, that are created with MS Excel. We have extended the list of our support spreadsheet formats in the latest release and added the ability to redact the content in the Numbers spreadsheets. Visit spreadsheet redaction for more details on text redaction in spreadsheets.

Setting PDF Compliance Level

GroupDocs.Redaction also allows saving the redacted document into a rasterized PDF document. Since the PDF documents may possess different compliance levels such as PDF/A-1a, PDF/A-1b, we have made it possible for you to set the compliance level of the resultant PDF document as per your choice. For this, the enum PdfComplianceLevel has been added to com.groupdocs.redaction.options package. The following code sample shows how to set the PDF compliance level.

Breaking Changes

In v19.11, we have introduced a new public API which is designed to be simple and easy to use. The following are some notable changes we have made in this version and if you are already using the API, you will face these breaking changes once you upgrade.

  • The Redactor class is introduced to manage the document redaction process (instead of Document class from previous versions).
  • The methods redactWith() of the Document class are replaced with similar apply() methods in the Redactor class. 
  • The classes RedactionSummary, RedactionLogRecord, and MetadataFilter have been renamed to RedactorChangeLog, RedactorLogRecord, and MetadataFilters respectively.
  • A number of new exception classes and base exception class for GroupDocs.Redaction exceptions are added.
  • The constructor LoadOptions(DocumentFormatConfiguration) has been removed.
  • All the obsolete members have been removed from the public API.

Please visit the migration notes to see how the classes, methods and their usage has been changed in v19.11.

Try out the latest release by downloading or cloning the source code examples from GitHub. Visit documentation for more details on how to redact, hide, or find and replace text, metadata, and annotations in Word, Excel, PowerPoint, PDF, and image formats.

In case you find something difficult for you, feel free to let us know via our forum.

Posted in GroupDocs.Redaction Product Family | Tagged , , , | Leave a comment