GroupDocs.Parser for Python 25.12 – Latest Updates and Fixes (December 2025)

We’re happy to announce the first release of GroupDocs.Parser for Python via .NET 25.12, available as of December 2025. This initial version brings the full power of the .NET parsing engine to Python developers, enabling extraction of text, images, attachments, barcodes, OCR content, and structured data from a wide range of document formats.

What’s new in this release

Major features

Text extraction – Retrieve plain or formatted text from PDFs, Office documents, emails, e‑books, archives and more.
Advanced search – Page‑level access with case‑sensitive, whole‑word, and regular‑expression search options.
Structured content parsing – Detect and extract document hierarchy such as headings, paragraphs, tables and custom text areas.
Template parsing – Use predefined templates to pull strongly‑typed fields from invoices, receipts and other business documents.
Image extraction – Pull embedded raster images from supported document and image formats.
Attachment extraction – Export file attachments embedded in documents.
Barcode scanning – Detect and read barcodes present in documents.
OCR support – Perform optical character recognition on scanned PDFs and raster images, with optional spell‑checking.
Metadata extraction – Access document properties like author, creation date, and custom metadata.
Table of contents extraction – Retrieve TOC structures from supported formats.
Hyperlink extraction – Extract hyperlinks (currently limited to a subset of formats).

Supported document formats

Word processing – DOC, DOCX, RTF, TXT, ODT
PDF & markup – PDF, HTML/MHTML, Markdown, XML
Spreadsheets – XLS, XLSX, ODS, CSV
Presentations – PPT, PPTX, ODP
Email & notes – PST, OST, EML, MSG, ONE
eBooks & web content – EPUB, MOBI, AZW3, CHM, FB2
Images – JPEG, PNG, TIFF, GIF, BMP, SVG
Archives & containers – ZIP, RAR, 7Z, TAR, GZ, BZ2

Platform support

Windows, Linux, and macOS
Python 3.5+

Installation

Download the appropriate WHL package for your platform from the GroupDocs Releases page:
- Windows x64
- Windows x32
- Linux
- macOS
- macOS ARM
Install the package with pip (replace * with the actual file name you downloaded):

pip install groupdocs_parser_net-25.12-*.whl

Getting started

The following snippet shows how to extract plain text from a PDF file:

from groupdocs.parser import Parser

# Create a Parser instance for your document
with Parser("sample.pdf") as parser:
    # Extract text from the document
    text = parser.GetText()
    
    # Print all extracted text to the console
    print(text)

For more complex scenarios—such as using templates, OCR, or barcode scanning—refer to the API reference and the code samples repository linked below.

How to get the update

Direct download – Choose the WHL package matching your OS from the GroupDocs Releases page.
pip upgrade – Once a newer version is published, upgrade with:

pip install --upgrade groupdocs_parser_net

GroupDocs.Parser for Python 25.12 – December 2025 Release Highlights

What’s new in this release

Major features

Supported document formats

Platform support

Installation

Getting started

How to get the update

Resources

What’s new in this release#

Major features#

Supported document formats#

Platform support#

Installation#

Getting started#

How to get the update#

Resources#

What’s new in this release

Major features

Supported document formats

Platform support

Installation

Getting started

How to get the update

Resources