GroupDocs.Markdown for Python via .NET — Export Documents to Markdown

Bring your documents into AI pipelines — straight from Python, on-premise, with one pip install.

Today we’re shipping the first public release of GroupDocs.Markdown for Python via .NET on PyPI. The library converts PDF, Word, Excel, EPUB, and 20+ more formats into clean, semantic Markdown — the format that LLMs, RAG pipelines, and static site generators work with best.

If you’ve been following the .NET release from September (or the full API overhaul in 26.3), the rationale is the same: document formatting carries semantics, and preserving that semantic structure is what makes a RAG system give good answers. The earlier post covers the problem (OCR flattens structure, LLMs need markdown) and the solution (a DOM-based renderer that walks the document and emits Markdown) in depth — we won’t repeat that story here.

Instead, let’s focus on what’s new for Python developers.

What you get

A single wheel, no runtime dependencies. pip install groupdocs-markdown-net pulls a self-contained wheel that bundles the .NET runtime and every native library it needs. No dotnet install, no Microsoft Office, no Adobe Acrobat, no cloud services.
Cross-platform. Windows x64/x86, Linux x64, macOS x64 and Apple Silicon (ARM64). Python 3.5 through 3.14.
A pythonic API. Classes use PascalCase, methods and properties use snake_case, enum values use UPPER_SNAKE_CASE. Context managers dispose loaded documents deterministically.
Truly async. Every static and instance method has an _async counterpart. File I/O is asynchronous and the CPU-bound conversion runs on a worker thread — your asyncio event loop stays free.
AI-agent friendly. The installed wheel bundles an AGENTS.md file so coding assistants (Claude Code, Cursor, GitHub Copilot, Codex) auto-discover the API surface, idiomatic usage patterns, and troubleshooting tips. Documentation is also published as llms.txt, a single-file corpus (llms-full.txt), per-page Markdown, and an MCP server — see AI-friendly by design below for details.

Get started

pip install groupdocs-markdown-net

The simplest conversion is a one-liner:

from groupdocs.markdown import MarkdownConverter

# Convert to a string
md = MarkdownConverter.to_markdown("business-plan.docx")

# Or write directly to a file
MarkdownConverter.to_file("business-plan.docx", "business-plan.md")

That’s it — no configuration, no options, no boilerplate. Evaluation mode processes the first 3 pages and adds a watermark. To remove the limits, apply a license:

from groupdocs.markdown import License

License().set_license("path/to/license.lic")

Or set GROUPDOCS_LIC_PATH as an environment variable and it will be applied automatically on import.

Supported formats

The Python package handles the same breadth of formats as the .NET library:

PDF — .pdf
Word / Rich Text — .doc, .docx, .docm, .dot, .dotx, .dotm, .rtf, .odt, .ott
Spreadsheets — .xls, .xlsx, .xlsb, .xlsm, .csv, .tsv, .ods, .ots
eBooks — .epub, .mobi
Text / Markup / Help — .txt, .xml, .chm

Pythonic examples

Conversion options and image strategies

from groupdocs.markdown import (
    MarkdownConverter,
    ConvertOptions,
    MarkdownFlavor,
    ExportImagesToFileSystemStrategy,
)

strategy = ExportImagesToFileSystemStrategy("output/images")
strategy.images_relative_path = "images"  # ![](images/img-001.png)

options = ConvertOptions()
options.flavor = MarkdownFlavor.GIT_HUB
options.heading_level_offset = 1      # # Title -> ## Title
options.include_front_matter = True    # prepend YAML metadata
options.image_export_strategy = strategy

MarkdownConverter.to_file("report.docx", "output/report.md", convert_options=options)

Document inspection without conversion

from groupdocs.markdown import MarkdownConverter

info = MarkdownConverter.get_info("business-plan.docx")
print(f"{info.file_format}, {info.page_count} pages, author: {info.author}")

Loading a password-protected file

from groupdocs.markdown import MarkdownConverter, LoadOptions, FileFormat

load_opts = LoadOptions(FileFormat.DOCX)
load_opts.password = "secret"

MarkdownConverter.to_file("protected.docx", "output.md", load_options=load_opts)

Streams and context managers

from groupdocs.markdown import MarkdownConverter

with open("document.docx", "rb") as stream:
    with MarkdownConverter(stream) as converter:
        converter.convert("document.md")

Async API — converting many documents concurrently

Because file I/O is asynchronous, asyncio.gather() lets a single worker process many documents without blocking:

import asyncio
from groupdocs.markdown import MarkdownConverter

async def convert_many():
    await asyncio.gather(
        MarkdownConverter.to_file_async("a.docx", "a.md", None),
        MarkdownConverter.to_file_async("b.pdf",  "b.md", None),
        MarkdownConverter.to_file_async("c.xlsx", "c.md", None),
    )

asyncio.run(convert_many())

This makes the library a natural fit for ASGI frameworks like FastAPI — a single worker can serve many concurrent conversion requests without thread contention.

Error handling

All conversion methods raise on failure, with specific exception types for common scenarios:

from groupdocs.markdown import (
    MarkdownConverter,
    DocumentProtectedException,
    InvalidFormatException,
    GroupDocsMarkdownException,
)

try:
    MarkdownConverter.to_file("annual-report.docx", "annual-report.md")
except DocumentProtectedException:
    print("Wrong or missing password.")
except InvalidFormatException:
    print("File is corrupt or unsupported.")
except GroupDocsMarkdownException as ex:
    print(f"Conversion failed: {ex}")

Built for RAG and LLM pipelines

Markdown is the preferred input format for embedding models and retrieval pipelines — it preserves headings, lists, tables, and emphasis while being easy to chunk and tokenize. A typical RAG ingestion looks like this:

import re
from groupdocs.markdown import MarkdownConverter, ConvertOptions, SkipImagesStrategy, MarkdownFlavor

options = ConvertOptions()
options.image_export_strategy = SkipImagesStrategy()   # text-only for RAG
options.flavor = MarkdownFlavor.COMMON_MARK

MarkdownConverter.to_file("business-plan.pdf", "business-plan.md", convert_options=options)

with open("business-plan.md", "r", encoding="utf-8") as f:
    markdown = f.read()

# Split by top-level headings, then embed/index each chunk
chunks = [c for c in re.split(r"\n#{1,2} ", markdown) if c.strip()]

Because the library runs entirely on-premise, sensitive documents never leave your environment — a common requirement for regulated industries, legal teams, and internal knowledge bases.

AI-friendly by design

Most Python SDKs treat AI coding assistants as an afterthought — a developer still has to point the agent at documentation, paste in examples, or debug through trial and error. GroupDocs.Markdown for Python via .NET flips that: the library is designed so that agents like Claude Code, Cursor, GitHub Copilot, and Codex can pick it up without any manual setup.

`AGENTS.md` ships inside the wheel

This is the first GroupDocs package to bundle an AGENTS.md file directly inside the installed wheel. The file follows the emerging AGENTS.md convention — a plain-Markdown README specifically written for AI coding assistants rather than humans.

When you run pip install groupdocs-markdown-net, a file lands at:

site-packages/groupdocs/markdown/AGENTS.md

An AI assistant opening your project can read it and immediately learn:

The full public API surface (classes, methods, enums, exceptions) and how they relate.
Idiomatic usage patterns for the most common scenarios — static vs instance API, sync vs async, image strategies, front matter, error handling.
Common pitfalls and how to avoid them — e.g. which ConvertOptions overloads accept None, how to handle password-protected files, how to capture conversion warnings.
Troubleshooting for platform-specific issues (libSkiaSharp on macOS, ICU on Linux).

In practice this means you can say “use groupdocs-markdown-net to convert this folder of PDFs to Markdown for my RAG pipeline” and the agent writes working code on the first try — no hallucinated method names, no wrong argument order, no guessed imports.

Machine-readable documentation

For agents that need to look up something that isn’t in AGENTS.md, the full product documentation is also published in machine-readable form:

Single-file corpus — the complete docs as one concatenated Markdown file, ready to drop into an agent’s context window: https://docs.groupdocs.com/markdown/python-net/llms-full.txt
Per-page Markdown — append .md to any docs URL to fetch the raw source: https://docs.groupdocs.com/markdown/python-net/quick-start-guide.md
llms.txt index — a llms.txt-style table of contents that points agents at the pages they need: https://docs.groupdocs.com/markdown/python-net/llms.txt

MCP server for live doc lookups

For agents that speak Model Context Protocol, we expose the docs as an MCP server. Add this to your Claude Code or Cursor config:

{
  "mcpServers": {
    "groupdocs-docs": {
      "url": "https://docs.groupdocs.com/mcp"
    }
  }
}

After that, your agent can query the documentation on demand instead of relying on training data that may be stale.

Markdown in, Markdown out

There’s a nice symmetry here: the library’s output is Markdown — the format LLMs parse best for RAG — and its documentation is also Markdown, served as a single file for easy context-window ingestion. Whether you’re asking an agent to write code that uses the library, or asking an agent to understand your documents via the library, Markdown is the common medium.

Export Example

The snippets above are close to the shortest useful program you can write with the library. Here’s the same idea packaged as a runnable project — source document, Python script, pre-generated output, requirements.txt, and a Dockerfile — so you can try it end-to-end without writing anything from scratch.

Source DOCX

The source file business-plan.docx is a short, richly-formatted business plan with headings, tables, images, and metadata.

Python script

from groupdocs.markdown import MarkdownConverter, ConvertOptions, MarkdownFlavor

def quick_example():
    """Convert a Word document to Markdown with GitHub flavor and YAML front matter."""

    # One-liner — returns a Markdown string
    md = MarkdownConverter.to_markdown("business-plan.docx")

    # With options — writes to a file
    options = ConvertOptions()
    options.flavor = MarkdownFlavor.GIT_HUB
    options.include_front_matter = True
    options.heading_level_offset = 1
    MarkdownConverter.to_file("business-plan.docx", "quick-example.md", convert_options=options)

if __name__ == "__main__":
    quick_example()

Output Markdown

The quick-example.md output starts with a YAML front-matter block auto-extracted from the document metadata, followed by the converted content with GitHub Flavored tables and a shifted heading hierarchy (ready to embed inside a larger document).

Runnable sample app

Everything bundled together: sample-app.zip. Unzip, then:

cd src
python -m venv .venv
# Windows: .venv\Scripts\activate
# Linux/macOS: source .venv/bin/activate
pip install -r requirements.txt
python quick_example.py

Or run it in Docker — the included Dockerfile sets up the ICU dependency the bundled .NET runtime needs on Linux:

cd src
docker build -t groupdocs-markdown-python-example .
docker run --rm -v "$(pwd)/output:/app/output" groupdocs-markdown-python-example

Summary

GroupDocs.Markdown for Python via .NET brings the full document-to-Markdown conversion engine to Python as a self-contained wheel — no external runtime, no cloud, no surprises. A pythonic API, async support, and first-class AI tooling integration make it a practical choice for Python teams building RAG systems, static site generators, or document processing pipelines.

Learn more

PyPI package: https://pypi.org/project/groupdocs-markdown-net/
Product home: https://products.groupdocs.com/markdown/python-net/
Documentation: https://docs.groupdocs.com/markdown/python-net/
Release notes: https://releases.groupdocs.com/markdown/python-net/release-notes/
Code examples on GitHub: https://github.com/groupdocs-markdown/GroupDocs.Markdown-for-Python-via-.NET
License information: https://about.groupdocs.com/legal/
Related .NET release post: GroupDocs.Markdown for .NET — First Public Release

Support & feedback

For questions or technical assistance, please use our Free Support Forum — we’ll be happy to help.

What you get#

Get started#

Supported formats#

Pythonic examples#

Conversion options and image strategies#

Document inspection without conversion#

Loading a password-protected file#

Streams and context managers#

Async API — converting many documents concurrently#

Error handling#

Built for RAG and LLM pipelines#

AI-friendly by design#

AGENTS.md ships inside the wheel#

Machine-readable documentation#

MCP server for live doc lookups#

Markdown in, Markdown out#

Export Example#

Summary#

Learn more#

Support & feedback#