Intégrez vos documents dans les pipelines d’IA — directement depuis Python, sur site, avec un seul pip install.
Aujourd’hui, nous publions la première version publique de GroupDocs.Markdown pour Python via .NET sur PyPI. La bibliothèque convertit PDF, Word, Excel, EPUB et plus de 20 autres formats en Markdown propre et sémantique — le format que les LLM, les pipelines RAG et les générateurs de sites statiques utilisent le mieux.
Si vous avez suivi la release .NET de septembre (ou la refonte complète de l’API dans la version 26.3), la logique est la même : le formatage d’un document porte des sémantiques, et préserver cette structure sémantique est ce qui permet à un système RAG de fournir de bonnes réponses. Le post précédent couvre le problème (l’OCR aplatit la structure, les LLM ont besoin de markdown) et la solution (un rendu basé sur le DOM qui parcourt le document et génère du Markdown) en profondeur — nous ne répéterons pas cette histoire ici.
Concentrons‑nous plutôt sur ce qui est nouveau pour les développeurs Python.
What you get
- A single wheel, no runtime dependencies.
pip install groupdocs-markdown-netpulls a self-contained wheel that bundles the .NET runtime and every native library it needs. Nodotnetinstall, no Microsoft Office, no Adobe Acrobat, no cloud services. - Cross-platform. Windows x64/x86, Linux x64, macOS x64 and Apple Silicon (ARM64). Python 3.5 through 3.14.
- A pythonic API. Classes use
PascalCase, methods and properties usesnake_case, enum values useUPPER_SNAKE_CASE. Context managers dispose loaded documents deterministically. - Vraiment asynchrone. Chaque méthode statique et d’instance possède une contrepartie
_async. Les opérations d’E/S de fichiers sont asynchrones et la conversion liée au CPU s’exécute sur un thread de travail — votre boucle d’événementsasyncioreste libre. - AI-agent friendly. The installed wheel bundles an
AGENTS.mdfile so coding assistants (Claude Code, Cursor, GitHub Copilot, Codex) auto-discover the API surface, idiomatic usage patterns, and troubleshooting tips. Documentation is also published asllms.txt, a single-file corpus (llms-full.txt), per-page Markdown, and an MCP server — see AI-friendly by design below for details.
Get started
pip install groupdocs-markdown-net
La conversion la plus simple se fait en une seule ligne :
from groupdocs.markdown import MarkdownConverter
# Convert to a string
md = MarkdownConverter.to_markdown("business-plan.docx")
# Or write directly to a file
MarkdownConverter.to_file("business-plan.docx", "business-plan.md")
C’est tout — aucune configuration, aucune option, aucun boilerplate. Le mode d’évaluation traite les 3 premières pages et ajoute un filigrane. Pour supprimer les limites, appliquez une licence :
from groupdocs.markdown import License
License().set_license("path/to/license.lic")
Ou définissez GROUPDOCS_LIC_PATH comme variable d’environnement ; elle sera appliquée automatiquement à l’importation.
Supported formats
Le package Python gère la même étendue de formats que la bibliothèque .NET :
- PDF —
.pdf - Word / Rich Text —
.doc,.docx,.docm,.dot,.dotx,.dotm,.rtf,.odt,.ott - Spreadsheets —
.xls,.xlsx,.xlsb,.xlsm,.csv,.tsv,.ods,.ots - eBooks —
.epub,.mobi - Text / Markup / Help —
.txt,.xml,.chm
Pythonic examples
Conversion options and image strategies
from groupdocs.markdown import (
MarkdownConverter,
ConvertOptions,
MarkdownFlavor,
ExportImagesToFileSystemStrategy,
)
strategy = ExportImagesToFileSystemStrategy("output/images")
strategy.images_relative_path = "images" # 
options = ConvertOptions()
options.flavor = MarkdownFlavor.GIT_HUB
options.heading_level_offset = 1 # # Title -> ## Title
options.include_front_matter = True # prepend YAML metadata
options.image_export_strategy = strategy
MarkdownConverter.to_file("report.docx", "output/report.md", convert_options=options)
Document inspection without conversion
from groupdocs.markdown import MarkdownConverter
info = MarkdownConverter.get_info("business-plan.docx")
print(f"{info.file_format}, {info.page_count} pages, author: {info.author}")
Loading a password-protected file
from groupdocs.markdown import MarkdownConverter, LoadOptions, FileFormat
load_opts = LoadOptions(FileFormat.DOCX)
load_opts.password = "secret"
MarkdownConverter.to_file("protected.docx", "output.md", load_options=load_opts)
Streams and context managers
from groupdocs.markdown import MarkdownConverter
with open("document.docx", "rb") as stream:
with MarkdownConverter(stream) as converter:
converter.convert("document.md")
Async API — converting many documents concurrently
Because file I/O is asynchronous, asyncio.gather() lets a single worker process many documents without blocking:
import asyncio
from groupdocs.markdown import MarkdownConverter
async def convert_many():
await asyncio.gather(
MarkdownConverter.to_file_async("a.docx", "a.md", None),
MarkdownConverter.to_file_async("b.pdf", "b.md", None),
MarkdownConverter.to_file_async("c.xlsx", "c.md", None),
)
asyncio.run(convert_many())
This makes the library a natural fit for ASGI frameworks like FastAPI — a single worker can serve many concurrent conversion requests without thread contention.
Error handling
All conversion methods raise on failure, with specific exception types for common scenarios:
from groupdocs.markdown import (
MarkdownConverter,
DocumentProtectedException,
InvalidFormatException,
GroupDocsMarkdownException,
)
try:
MarkdownConverter.to_file("annual-report.docx", "annual-report.md")
except DocumentProtectedException:
print("Wrong or missing password.")
except InvalidFormatException:
print("File is corrupt or unsupported.")
except GroupDocsMarkdownException as ex:
print(f"Conversion failed: {ex}")
Built for RAG and LLM pipelines
Markdown is the preferred input format for embedding models and retrieval pipelines — it preserves headings, lists, tables, and emphasis while being easy to chunk and tokenize. A typical RAG ingestion looks like this:
import re
from groupdocs.markdown import MarkdownConverter, ConvertOptions, SkipImagesStrategy, MarkdownFlavor
options = ConvertOptions()
options.image_export_strategy = SkipImagesStrategy() # text-only for RAG
options.flavor = MarkdownFlavor.COMMON_MARK
MarkdownConverter.to_file("business-plan.pdf", "business-plan.md", convert_options=options)
with open("business-plan.md", "r", encoding="utf-8") as f:
markdown = f.read()
# Split by top-level headings, then embed/index each chunk
chunks = [c for c in re.split(r"\n#{1,2} ", markdown) if c.strip()]
Because the library runs entirely on-premise, sensitive documents never leave your environment — a common requirement for regulated industries, legal teams, and internal knowledge bases.
AI-friendly by design
Most Python SDKs treat AI coding assistants as an afterthought — a developer still has to point the agent at documentation, paste in examples, or debug through trial and error. GroupDocs.Markdown for Python via .NET flips that: the library is designed so that agents like Claude Code, Cursor, GitHub Copilot, and Codex can pick it up without any manual setup.
AGENTS.md ships inside the wheel
This is the first GroupDocs package to bundle an AGENTS.md file directly inside the installed wheel. The file follows the emerging AGENTS.md convention — a plain-Markdown README specifically written for AI coding assistants rather than humans.
When you run pip install groupdocs-markdown-net, a file lands at:
site-packages/groupdocs/markdown/AGENTS.md
An AI assistant opening your project can read it and immediately learn:
- The full public API surface (classes, methods, enums, exceptions) and how they relate.
- Idiomatic usage patterns for the most common scenarios — static vs instance API, sync vs async, image strategies, front matter, error handling.
- Common pitfalls and how to avoid them — e.g. which
ConvertOptionsoverloads acceptNone, how to handle password-protected files, how to capture conversion warnings. - Troubleshooting for platform-specific issues (libSkiaSharp on macOS, ICU on Linux).
In practice this means you can say “use groupdocs-markdown-net to convert this folder of PDFs to Markdown for my RAG pipeline” and the agent writes working code on the first try — no hallucinated method names, no wrong argument order, no guessed imports.
Machine-readable documentation
For agents that need to look up something that isn’t in AGENTS.md, the full product documentation is also published in machine-readable form:
- Single-file corpus — the complete docs as one concatenated Markdown file, ready to drop into an agent’s context window:
https://docs.groupdocs.com/markdown/python-net/llms-full.txt - Per-page Markdown — append
.mdto any docs URL to fetch the raw source:https://docs.groupdocs.com/markdown/python-net/quick-start-guide.md llms.txtindex — a llms.txt-style table of contents that points agents at the pages they need:https://docs.groupdocs.com/markdown/python-net/llms.txt
MCP server for live doc lookups
For agents that speak Model Context Protocol, we expose the docs as an MCP server. Add this to your Claude Code or Cursor config:
{
"mcpServers": {
"groupdocs-docs": {
"url": "https://docs.groupdocs.com/mcp"
}
}
}
After that, your agent can query the documentation on demand instead of relying on training data that may be stale.
Markdown in, Markdown out
There’s a nice symmetry here: the library’s output is Markdown — the format LLMs parse best for RAG — and its documentation is also Markdown, served as a single file for easy context-window ingestion. Whether you’re asking an agent to write code that uses the library, or asking an agent to understand your documents via the library, Markdown is the common medium.
Export Example
The snippets above are close to the shortest useful program you can write with the library. Here’s the same idea packaged as a runnable project — source document, Python script, pre-generated output, requirements.txt, and a Dockerfile — so you can try it end-to-end without writing anything from scratch.
Source DOCX
The source file business-plan.docx is a short, richly-formatted business plan with headings, tables, images, and metadata.
Python script
from groupdocs.markdown import MarkdownConverter, ConvertOptions, MarkdownFlavor
def quick_example():
"""Convert a Word document to Markdown with GitHub flavor and YAML front matter."""
# One-liner — returns a Markdown string
md = MarkdownConverter.to_markdown("business-plan.docx")
# With options — writes to a file
options = ConvertOptions()
options.flavor = MarkdownFlavor.GIT_HUB
options.include_front_matter = True
options.heading_level_offset = 1
MarkdownConverter.to_file("business-plan.docx", "quick-example.md", convert_options=options)
if __name__ == "__main__":
quick_example()
Output Markdown
The quick-example.md output starts with a YAML front-matter block auto-extracted from the document metadata, followed by the converted content with GitHub Flavored tables and a shifted heading hierarchy (ready to embed inside a larger document).
Runnable sample app
Everything bundled together: sample-app.zip. Unzip, then:
cd src
python -m venv .venv
# Windows: .venv\Scripts\activate
# Linux/macOS: source .venv/bin/activate
pip install -r requirements.txt
python quick_example.py
Or run it in Docker — the included Dockerfile sets up the ICU dependency the bundled .NET runtime needs on Linux:
cd src
docker build -t groupdocs-markdown-python-example .
docker run --rm -v "$(pwd)/output:/app/output" groupdocs-markdown-python-example
Summary
GroupDocs.Markdown for Python via .NET brings the full document-to-Markdown conversion engine to Python as a self-contained wheel — no external runtime, no cloud, no surprises. A pythonic API, async support, and first-class AI tooling integration make it a practical choice for Python teams building RAG systems, static site generators, or document processing pipelines.
Learn more
- PyPI package: https://pypi.org/project/groupdocs-markdown-net/
- Product home: https://products.groupdocs.com/markdown/python-net/
- Documentation: https://docs.groupdocs.com/markdown/python-net/
- Release notes: https://releases.groupdocs.com/markdown/python-net/release-notes/
- Code examples on GitHub: https://github.com/groupdocs-markdown/GroupDocs.Markdown-for-Python-via-.NET
- License information: https://about.groupdocs.com/legal/
- Related .NET release post: GroupDocs.Markdown for .NET — First Public Release
Support & feedback
For questions or technical assistance, please use our Free Support Forum — we’ll be happy to help.