Text-replace In Docx And Save The Changed File With Python-docx
Solution 1:
this worked for me:
def docx_replace(old_file,new_file,rep):
zin = zipfile.ZipFile (old_file, 'r')
zout = zipfile.ZipFile (new_file, 'w')
for item in zin.infolist():
buffer = zin.read(item.filename)
if (item.filename == 'word/document.xml'):
res = buffer.decode("utf-8")
for r in rep:
res = res.replace(r,rep[r])
buffer = res.encode("utf-8")
zout.writestr(item, buffer)
zout.close()
zin.close()
Solution 2:
As it seems to be, Docx for Python is not meant to store a full Docx with images, headers, ... , but only contains the inner content of the document. So there's no simple way to do this.
Howewer, here is how you could do it:
First, have a look at the docx tag wiki:
It explains how the docx file can be unzipped: Here's how a typical file looks like:
+--docProps
| + app.xml
| \ core.xml
+ res.log
+--word //this folder contains most of the files that control the content of the document
| + document.xml//Is the actual content of the document
| + endnotes.xml
| + fontTable.xml
| + footer1.xml//Containst the elements in the footer of the document
| + footnotes.xml
| +--media //This folder contains all images embedded in the word
| | \ image1.jpeg
| + settings.xml
| + styles.xml
| + stylesWithEffects.xml
| +--theme
| | \ theme1.xml
| + webSettings.xml
| \--_rels
| \ document.xml.rels//this document tells word where the images are situated
+ [Content_Types].xml
\--_rels
\ .rels
Docx only gets one part of the document, in the method opendocx
defopendocx(file):
'''Open a docx file, return a document XML tree'''
mydoc = zipfile.ZipFile(file)
xmlcontent = mydoc.read('word/document.xml')
document = etree.fromstring(xmlcontent)
return document
It only gets the document.xml file.
What I recommend you to do is:
- get the content of the document with **opendocx*
- Replace the document.xml with the advReplace method
- Open the docx as a zip, and replace the document.xml content's by the new xml content.
- Close and output the zipped file (renaming it to output.docx)
If you have node.js installed, be informed that I have worked on DocxGenJS which is templating engine for docx documents, the library is in active development and will be released soon as a node module.
Solution 3:
Are you using the docx module from here?
If yes, then the docx module already exposes methods like replace, advReplace etc which can help you achieve your task. Refer to the source code for more details of the exposed methods.
Solution 4:
from docx import Document
file_path = 'C:/tmp.docx'
document = Document(file_path)
defdocx_replace(doc_obj, data: dict):
"""example: data=dict(order_id=123), result: {order_id} -> 123"""for paragraph in doc_obj.paragraphs:
for key, val in data.items():
key_name = '{{{}}}'.format(key)
if key_name in paragraph.text:
paragraph.text = paragraph.text.replace(key_name, str(val))
for table in doc_obj.tables:
for row in table.rows:
for cell in row.cells:
docx_replace(cell, data)
docx_replace(document, dict(order_id=123, year=2018, payer_fio='payer_fio', payer_fio1='payer_fio1'))
document.save(file_path)
Solution 5:
I've forked a repo of python-docx here, which preserves all of the preexisting data in a docx file, including formatting. Hopefully this is what you're looking for.
Post a Comment for "Text-replace In Docx And Save The Changed File With Python-docx"