Definition
The use of the python-docx module to create and manipulate Microsoft Word (.docx) files.
Why It Matters
Administrative work is the “friction” of modern life. Python’s Word automation allows a single person to perform the labor of an entire department, turning what used to be weeks of manual formatting into a few seconds of execution.
Core Concepts
- The Hierarchy Model:
- Document: The entire file.
- Paragraph: A single block of text (ends with a newline).
- Run: A contiguous string of text with a consistent style (bold, font, etc.). Every time the style changes, a new Run begins.
- Reading: Iterate through
doc.paragraphs, and for each paragraph, iterate through itsruns. - Writing:
add_paragraph(),add_run(),add_heading(),add_picture(), andadd_break(). - Styling: Controlled at the Run level. Attributes use Three-State Logic:
True(On),False(Off), orNone(Inherit from Paragraph/Document).
import docx
doc = docx.Document()
doc.add_paragraph('Hello, this is an automated Word document.')
para = doc.add_paragraph('This is ')
para.add_run('bolded text').bold = True
doc.save('automated.docx')