Bring your documents into AI pipelines — straight from Python, on-premise, with one pip install.
Today we’re shipping the first public release of GroupDocs.Markdown for Python via .NET on PyPI. The library converts PDF, Word, Excel, EPUB, and 20+ more formats into clean, semantic Markdown — the format that LLMs, RAG pipelines, and static site generators work with best.
If you’ve been following the .NET release from September (or the full API overhaul in 26.3), the rationale is the same: document formatting carries semantics, and preserving that semantic structure is what makes a RAG system give good answers. The earlier post covers the problem (OCR flattens structure, LLMs need markdown) and the solution (a DOM-based renderer that walks the document and emits Markdown) in depth — we won’t repeat that story here.
Instead, let’s focus on what’s new for Python developers.
What you get
- A single wheel, no runtime dependencies.
pip install groupdocs-markdown-netpulls a self-contained wheel that bundles the .NET runtime and every native library it needs. Nodotnetinstall, no Microsoft Office, no Adobe Acrobat, no cloud services. - Cross-platform. Windows x64/x86, Linux x64, macOS x64 and Apple Silicon (ARM64). Python 3.5 through 3.14.
- A pythonic API. Classes use
PascalCase, methods and properties usesnake_case, enum values useUPPER_SNAKE_CASE. Context managers dispose loaded documents deterministically. - Truly async. Every static and instance method has an
_asynccounterpart. File I/O is asynchronous and the CPU-bound conversion runs on a worker thread — yourasyncioevent loop stays free. - AI-agent friendly. The installed wheel bundles an
AGENTS.mdfile so coding assistants (Claude Code, Cursor, GitHub Copilot, Codex) auto-discover the API surface, idiomatic usage patterns, and troubleshooting tips. Documentation is also published asllms.txt, a single-file corpus (llms-full.txt), per-page Markdown, and an MCP server — see AI-friendly by design below for details.
Get started
pip install groupdocs-markdown-net
The simplest conversion is a one-liner:
from groupdocs.markdown import MarkdownConverter
# Convert to a string
md = MarkdownConverter.to_markdown("business-plan.docx")
# Or write directly to a file
MarkdownConverter.to_file("business-plan.docx", "business-plan.md")
That’s it — no configuration, no options, no boilerplate. Evaluation mode processes the first 3 pages and adds a watermark. To remove the limits, apply a license:
from groupdocs.markdown import License
License().set_license("path/to/license.lic")
Or set GROUPDOCS_LIC_PATH as an environment variable and it will be applied automatically on import.
Supported formats
The Python package handles the same breadth of formats as the .NET library:
- PDF —
.pdf - Word / Rich Text —
.doc,.docx,.docm,.dot,.dotx,.dotm,.rtf,.odt,.ott - Spreadsheets —
.xls,.xlsx,.xlsb,.xlsm,.csv,.tsv,.ods,.ots - eBooks —
.epub,.mobi - Text / Markup / Help —
.txt,.xml,.chm
Pythonic examples
Conversion options and image strategies
from groupdocs.markdown import (
MarkdownConverter,
ConvertOptions,
MarkdownFlavor,
ExportImagesToFileSystemStrategy,
)
strategy = ExportImagesToFileSystemStrategy("output/images")
strategy.images_relative_path = "images" # 
options = ConvertOptions()
options.flavor = MarkdownFlavor.GIT_HUB
options.heading_level_offset = 1 # # Title -> ## Title
options.include_front_matter = True # prepend YAML metadata
options.image_export_strategy = strategy
MarkdownConverter.to_file("report.docx", "output/report.md", convert_options=options)
Document inspection without conversion
from groupdocs.markdown import MarkdownConverter
info = MarkdownConverter.get_info("business-plan.docx")
print(f"{info.file_format}, {info.page_count} pages, author: {info.author}")
Loading a password-protected file
from groupdocs.markdown import MarkdownConverter, LoadOptions, FileFormat
load_opts = LoadOptions(FileFormat.DOCX)
load_opts.password = "secret"
MarkdownConverter.to_file("protected.docx", "output.md", load_options=load_opts)
Streams and context managers
from groupdocs.markdown import MarkdownConverter
with open("document.docx", "rb") as stream:
with MarkdownConverter(stream) as converter:
converter.convert("document.md")
Async API — converting many documents concurrently
Because file I/O is asynchronous, asyncio.gather() lets a single worker process many documents without blocking:
import asyncio
from groupdocs.markdown import MarkdownConverter
async def convert_many():
await asyncio.gather(
MarkdownConverter.to_file_async("a.docx", "a.md", None),
MarkdownConverter.to_file_async("b.pdf", "b.md", None),
MarkdownConverter.to_file_async("c.xlsx", "c.md", None),
)
asyncio.run(convert_many())
This makes the library a natural fit for ASGI frameworks like FastAPI — a single worker can serve many concurrent conversion requests without thread contention.
Error handling
All conversion methods raise on failure, with specific exception types for common scenarios:
from groupdocs.markdown import (
MarkdownConverter,
DocumentProtectedException,
InvalidFormatException,
GroupDocsMarkdownException,
)
try:
MarkdownConverter.to_file("annual-report.docx", "annual-report.md")
except DocumentProtectedException:
print("Wrong or missing password.")
except InvalidFormatException:
print("File is corrupt or unsupported.")
except GroupDocsMarkdownException as ex:
print(f"Conversion failed: {ex}")
Built for RAG and LLM pipelines
Markdown is the preferred input format for embedding models and retrieval pipelines — it preserves headings, lists, tables, and emphasis while being easy to chunk and tokenize. A typical RAG ingestion looks like this:
import re
from groupdocs.markdown import MarkdownConverter, ConvertOptions, SkipImagesStrategy, MarkdownFlavor
options = ConvertOptions()
options.image_export_strategy = SkipImagesStrategy() # text-only for RAG
options.flavor = MarkdownFlavor.COMMON_MARK
MarkdownConverter.to_file("business-plan.pdf", "business-plan.md", convert_options=options)
with open("business-plan.md", "r", encoding="utf-8") as f:
markdown = f.read()
# Split by top-level headings, then embed/index each chunk
chunks = [c for c in re.split(r"\n#{1,2} ", markdown) if c.strip()]
Because the library runs entirely on-premise, sensitive documents never leave your environment — a common requirement for regulated industries, legal teams, and internal knowledge bases.
AI-friendly by design
Most Python SDKs treat AI coding assistants as an afterthought — a developer still has to point the agent at documentation, paste in examples, or debug through trial and error. GroupDocs.Markdown for Python via .NET flips that: the library is designed so that agents like Claude Code, Cursor, GitHub Copilot, and Codex can pick it up without any manual setup.
AGENTS.md ships inside the wheel
This is the first GroupDocs package to bundle an AGENTS.md file directly inside the installed wheel. The file follows the emerging AGENTS.md convention — a plain-Markdown README specifically written for AI coding assistants rather than humans.
When you run pip install groupdocs-markdown-net, a file lands at:
site-packages/groupdocs/markdown/AGENTS.md
An AI assistant opening your project can read it and immediately learn:
- The full public API surface (classes, methods, enums, exceptions) and how they relate.
- Idiomatic usage patterns for the most common scenarios — static vs instance API, sync vs async, image strategies, front matter, error handling.
- Common pitfalls and how to avoid them — e.g. which
ConvertOptionsoverloads acceptNone, how to handle password-protected files, how to capture conversion warnings. - Troubleshooting for platform-specific issues (libSkiaSharp on macOS, ICU on Linux).
In practice this means you can say “use groupdocs-markdown-net to convert this folder of PDFs to Markdown for my RAG pipeline” and the agent writes working code on the first try — no hallucinated method names, no wrong argument order, no guessed imports.
Machine-readable documentation
For agents that need to look up something that isn’t in AGENTS.md, the full product documentation is also published in machine-readable form:
- Single-file corpus — the complete docs as one concatenated Markdown file, ready to drop into an agent’s context window:
https://docs.groupdocs.com/markdown/python-net/llms-full.txt - Per-page Markdown — append
.mdto any docs URL to fetch the raw source:https://docs.groupdocs.com/markdown/python-net/quick-start-guide.md llms.txtindex — a llms.txt-style table of contents that points agents at the pages they need:https://docs.groupdocs.com/markdown/python-net/llms.txt
MCP server for live doc lookups
For agents that speak Model Context Protocol, we expose the docs as an MCP server. Add this to your Claude Code or Cursor config:
{
"mcpServers": {
"groupdocs-docs": {
"url": "https://docs.groupdocs.com/mcp"
}
}
}
After that, your agent can query the documentation on demand instead of relying on training data that may be stale.
Markdown in, Markdown out
There’s a nice symmetry here: the library’s output is Markdown — the format LLMs parse best for RAG — and its documentation is also Markdown, served as a single file for easy context-window ingestion. Whether you’re asking an agent to write code that uses the library, or asking an agent to understand your documents via the library, Markdown is the common medium.
Export Example
The snippets above are close to the shortest useful program you can write with the library. Here’s the same idea packaged as a runnable project — source document, Python script, pre-generated output, requirements.txt, and a Dockerfile — so you can try it end-to-end without writing anything from scratch.
Source DOCX
The source file business-plan.docx is a short, richly-formatted business plan with headings, tables, images, and metadata.
Python script
from groupdocs.markdown import MarkdownConverter, ConvertOptions, MarkdownFlavor
def quick_example():
"""Convert a Word document to Markdown with GitHub flavor and YAML front matter."""
# One-liner — returns a Markdown string
md = MarkdownConverter.to_markdown("business-plan.docx")
# With options — writes to a file
options = ConvertOptions()
options.flavor = MarkdownFlavor.GIT_HUB
options.include_front_matter = True
options.heading_level_offset = 1
MarkdownConverter.to_file("business-plan.docx", "quick-example.md", convert_options=options)
if __name__ == "__main__":
quick_example()
Output Markdown
The quick-example.md output starts with a YAML front-matter block auto-extracted from the document metadata, followed by the converted content with GitHub Flavored tables and a shifted heading hierarchy (ready to embed inside a larger document).
Runnable sample app
Everything bundled together: sample-app.zip. Unzip, then:
cd src
python -m venv .venv
# Windows: .venv\Scripts\activate
# Linux/macOS: source .venv/bin/activate
pip install -r requirements.txt
python quick_example.py
Or run it in Docker — the included Dockerfile sets up the ICU dependency the bundled .NET runtime needs on Linux:
cd src
docker build -t groupdocs-markdown-python-example .
docker run --rm -v "$(pwd)/output:/app/output" groupdocs-markdown-python-example
Summary
GroupDocs.Markdown for Python via .NET brings the full document-to-Markdown conversion engine to Python as a self-contained wheel — no external runtime, no cloud, no surprises. A pythonic API, async support, and first-class AI tooling integration make it a practical choice for Python teams building RAG systems, static site generators, or document processing pipelines.
Learn more
- PyPI package: https://pypi.org/project/groupdocs-markdown-net/
- Product home: https://products.groupdocs.com/markdown/python-net/
- Documentation: https://docs.groupdocs.com/markdown/python-net/
- Release notes: https://releases.groupdocs.com/markdown/python-net/release-notes/
- Code examples on GitHub: https://github.com/groupdocs-markdown/GroupDocs.Markdown-for-Python-via-.NET
- License information: https://about.groupdocs.com/legal/
- Related .NET release post: GroupDocs.Markdown for .NET — First Public Release
Support & feedback
For questions or technical assistance, please use our Free Support Forum — we’ll be happy to help.