Text-replace In Docx And Save The Changed File With Python-docx

June 28, 2023 Post a Comment

I'm trying to use the python-docx module to replace a word in a file and save the new file with the caveat that the new file must have exactly the same formatting as the old file,

Solution 1:

this worked for me:

def docx_replace(old_file,new_file,rep):
    zin = zipfile.ZipFile (old_file, 'r')
    zout = zipfile.ZipFile (new_file, 'w')
    for item in zin.infolist():
        buffer = zin.read(item.filename)
        if (item.filename == 'word/document.xml'):
            res = buffer.decode("utf-8")
            for r in rep:
                res = res.replace(r,rep[r])
            buffer = res.encode("utf-8")
        zout.writestr(item, buffer)
    zout.close()
    zin.close()

Solution 2:

As it seems to be, Docx for Python is not meant to store a full Docx with images, headers, ... , but only contains the inner content of the document. So there's no simple way to do this.

Howewer, here is how you could do it:

First, have a look at the docx tag wiki:

It explains how the docx file can be unzipped: Here's how a typical file looks like:

+--docProps
|  +  app.xml
|  \  core.xml
+  res.log
+--word //this folder contains most of the files that control the content of the document
|  +  document.xml//Is the actual content of the document
|  +  endnotes.xml
|  +  fontTable.xml
|  +  footer1.xml//Containst the elements in the footer of the document
|  +  footnotes.xml
|  +--media //This folder contains all images embedded in the word
|  |  \  image1.jpeg
|  +  settings.xml
|  +  styles.xml
|  +  stylesWithEffects.xml
|  +--theme
|  |  \  theme1.xml
|  +  webSettings.xml
|  \--_rels
|     \  document.xml.rels//this document tells word where the images are situated
+  [Content_Types].xml
\--_rels
   \  .rels

Docx only gets one part of the document, in the method opendocx

defopendocx(file):
    '''Open a docx file, return a document XML tree'''
    mydoc = zipfile.ZipFile(file)
    xmlcontent = mydoc.read('word/document.xml')
    document = etree.fromstring(xmlcontent)
    return document

It only gets the document.xml file.

What I recommend you to do is:

get the content of the document with **opendocx*
Replace the document.xml with the advReplace method
Open the docx as a zip, and replace the document.xml content's by the new xml content.
Close and output the zipped file (renaming it to output.docx)

If you have node.js installed, be informed that I have worked on DocxGenJS which is templating engine for docx documents, the library is in active development and will be released soon as a node module.

Solution 3:

Are you using the docx module from here?

If yes, then the docx module already exposes methods like replace, advReplace etc which can help you achieve your task. Refer to the source code for more details of the exposed methods.

Solution 4:

from docx import Document
file_path = 'C:/tmp.docx'
document = Document(file_path)

defdocx_replace(doc_obj, data: dict):
    """example: data=dict(order_id=123), result: {order_id} -> 123"""for paragraph in doc_obj.paragraphs:
        for key, val in data.items():
            key_name = '{{{}}}'.format(key)
            if key_name in paragraph.text:
                paragraph.text = paragraph.text.replace(key_name, str(val))
    for table in doc_obj.tables:
        for row in table.rows:
            for cell in row.cells:
                docx_replace(cell, data)

docx_replace(document, dict(order_id=123, year=2018, payer_fio='payer_fio', payer_fio1='payer_fio1'))
document.save(file_path)

Solution 5:

I've forked a repo of python-docx here, which preserves all of the preexisting data in a docx file, including formatting. Hopefully this is what you're looking for.

Python Manual