GroupDocs.Parser for Java We are pleased to announce that the first version of GroupDocs.Parser for Java has been released. GroupDocs.Parser for Java allows the Java developers to extract raw and formatted text from the popular document formats. The API also supports working with containers such as ZIP and email containers. You can also access the metadata attached to the documents using a few lines of code. Please continue to read more about the features and the file formats supported by the API.

Supported Features

Following are the salient features exposed by GroupDocs.Parser for Java.

  • Extract text from various document formats
  • Extract main document properties
  • Extract text and metadata from containers (PST, OST, ZIP containers are currently supported)
  • Extract text and metadata from mail servers (POP, IMAP and Microsoft Exchange Server are supported)
  • Extract formatted text. Plain text, Markdown, and HTML formatters are present
  • Extract structured text
  • Support password protected document (ability to provide the password if it is required)
  • Service functions like encoding detection, media type detection and the ability to connect the logger
  • Search text in documents
  • Text analysis API (Pdf format is currently supported)

For more details on supported features, please visit the article: Features Overview.

Supported File Formats

The following is the list of file formats supported by GroupDocs.Parser.

  1. Text Document Formats (.doc/.docx/.dot/.rtf/.docm/.odt/.xml/.txt/.md)
  2. Presentation Document Formats (.ppt/.pptx/.pps/.pptm/.ppsm/.ppsx/.odp)
  3. Spreadsheet Document Formats (.xls/,xlsx/.xlsm/.xlsb/.csv/.ods/Tab Separated Values/SpreadsheetML (.xml))
  4. OneNote Documents (.one)
  5. Emails (.msg/.eml/.emlx/TNEF/.pst/.ost/POP/IMAP)
  6. Electronic Publication Formats (.epub/.fb2 (FuctionBook))
  7. Portable Document Format (.pdf/PDF Portfolio/Encrypted PDF)
  8. DOM-based Documents (.xml/.html/.xhtml/.mhtml)
  9. Compression and Packaging Formats (.zip/.chm)

For more details on supported formats, please visit the article: Supported File Formats.

Example Business Cases

Repetition of data can diminish the worth of an article. Working as a writer, one must follow DRY (don’t repeat yourself) principle. Cross reading the articles, again and again, may cost a lot of time. Counting the statistics of word’s occurrences can endeavor the goal but again it’s hard to do it manually. Eventually, you need to read the whole article and keep track of the words. GroupDocs.Parser may help in this case. In order to elaborate real-life needs, we have envisaged some real-life cases. Please feel free to visit the article: Working with Business Cases.

Available Channels and Resources

Here are a few channels and resources for you to download, learn, try and get technical support on GroupDocs.Parser:

Feedback

As always, if you have any questions or suggestions, feel free to write on our forum.