Skip to content Skip to sidebar Skip to footer

Define Default Namespace (unprefixed) In Lxml

When rendering XHTML with lxml, everything is fine, unless you happen to use Firefox, which seems unable to deal with namespace-prefixed XHTML elements and javascript. While Opera

Solution 1:

Use ElementMaker and give it an nsmap that maps None to your default namespace.

#!/usr/bin/env python# dogeml.pyfrom lxml.builder import ElementMaker
from lxml import etree

E = ElementMaker(
    nsmap={
        None: "http://wow/"# <--- This is the special sauce
    }
)

doge = E.doge(
    E.such('markup'),
    E.many('very namespaced', syntax="tricks")
)

options = {
    'pretty_print': True,
    'xml_declaration': True,
    'encoding': 'UTF-8',
}

serialized_bytes = etree.tostring(doge, **options)
print(serialized_bytes.decode(options['encoding']))

As you can see in the output from this script, the default namespace is defined, but the tags do not have a prefix.

<?xml version='1.0' encoding='UTF-8'?><dogexmlns="http://wow/"><such>markup</such><manysyntax="tricks">very namespaced</many></doge>

I have tested this code with Python 2.7.6, 3.3.5, and 3.4.0, combined with lxml 3.3.1.

Solution 2:

This XSL transformation removes all prefixes from content, while maintaining namespaces defined in the root node:

import lxml.etree as ET

content = '''\
<?xml version='1.0' encoding='utf-8'?><!DOCTYPE html><h:htmlxmlns:h="http://www.w3.org/1999/xhtml"xmlns:ml="http://foo"><h:head><h:title>MathJax Test Page</h:title><h:scripttype="text/javascript"><![CDATA[
      function test() {
        alert(document.getElementsByTagName("p").length);
      };
    ]]></h:script></h:head><h:bodyonload="test();"><h:p>test</h:p><ml:foo></ml:foo></h:body></h:html>
'''
dom = ET.fromstring(content)

xslt = '''\
<xsl:stylesheetversion="1.0"xmlns="http://www.w3.org/1999/xhtml"xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:outputmethod="xml"indent="no"/><!-- identity transform for everything else --><xsl:templatematch="/|comment()|processing-instruction()|*|@*"><xsl:copy><xsl:apply-templates /></xsl:copy></xsl:template><!-- remove NS from XHTML elements --><xsl:templatematch="*[namespace-uri() = 'http://www.w3.org/1999/xhtml']"><xsl:elementname="{local-name()}"><xsl:apply-templatesselect="@*|node()" /></xsl:element></xsl:template><!-- remove NS from XHTML attributes --><xsl:templatematch="@*[namespace-uri() = 'http://www.w3.org/1999/xhtml']"><xsl:attributename="{local-name()}"><xsl:value-ofselect="." /></xsl:attribute></xsl:template></xsl:stylesheet>
'''

xslt_doc = ET.fromstring(xslt)
transform = ET.XSLT(xslt_doc)
dom = transform(dom)

print(ET.tostring(dom, pretty_print = True, 
                  encoding = 'utf-8'))

yields

<htmlxmlns="http://www.w3.org/1999/xhtml"><head><title>MathJax Test Page</title><scripttype="text/javascript">functiontest() {
        alert(document.getElementsByTagName("p").length);
      };
    </script></head><bodyonload="test();"><p>test</p><ml:fooxmlns:ml="http://foo"/></body></html>

Solution 3:

To expand on @neirbowj's answer, but using ET.Element and ET.SubElement, and rendering a document with a mix of namespaces, where the root happens to be explicitly namespaced and a subelement (channel) is the default namespace:

# I set up but don't use the default namespace:
root = ET.Element('{http://www.w3.org/1999/02/22-rdf-syntax-ns#}RDF', nsmap={None: 'http://purl.org/rss/1.0/'})
# I use the default namespace by including its URL in curly braces:
e = ET.SubElement(root, '{http://purl.org/rss/1.0/}channel')
print(ET.tostring(root, xml_declaration=True, encoding='utf8').decode())

This will print out the following:

<?xml version='1.0' encoding='utf8'?><rdf:RDFxmlns="http://purl.org/rss/1.0/"xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"><channel/></rdf:RDF>

It automatically uses rdf for the RDF namespace. I'm not sure how it figures it out. If I want to specify it I can add it to my nsmap in the root element:

nsmap = {None: 'http://purl.org/rss/1.0/',
         'doge': 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'}
root = ET.Element('{http://www.w3.org/1999/02/22-rdf-syntax-ns#}RDF', nsmap=nsmap)
e = ET.SubElement(root, '{http://purl.org/rss/1.0/}channel')
print(ET.tostring(root, xml_declaration=True, encoding='utf8').decode())

...and I get this:

<?xml version='1.0' encoding='utf8'?><doge:RDFxmlns:doge="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns="http://purl.org/rss/1.0/"><channel/></doge:RDF>

Post a Comment for "Define Default Namespace (unprefixed) In Lxml"