Define Default Namespace (unprefixed) In Lxml

When rendering XHTML with lxml, everything is fine, unless you happen to use Firefox, which seems unable to deal with namespace-prefixed XHTML elements and javascript. While Opera

Solution 1:

Use ElementMaker and give it an nsmap that maps None to your default namespace.

#!/usr/bin/env python# dogeml.pyfrom lxml.builder import ElementMaker
from lxml import etree

E = ElementMaker(
        None: "http://wow/"# <--- This is the special sauce

doge = E.doge(
    E.many('very namespaced', syntax="tricks")

options = {
    'pretty_print': True,
    'xml_declaration': True,
    'encoding': 'UTF-8',

serialized_bytes = etree.tostring(doge, **options)

As you can see in the output from this script, the default namespace is defined, but the tags do not have a prefix.

<?xml version='1.0' encoding='UTF-8'?><dogexmlns="http://wow/"><such>markup</such><manysyntax="tricks">very namespaced</many></doge>

I have tested this code with Python 2.7.6, 3.3.5, and 3.4.0, combined with lxml 3.3.1.

Solution 2:

This XSL transformation removes all prefixes from content, while maintaining namespaces defined in the root node:

import lxml.etree as ET

content = '''\
<?xml version='1.0' encoding='utf-8'?><!DOCTYPE html><h:htmlxmlns:h=""xmlns:ml="http://foo"><h:head><h:title>MathJax Test Page</h:title><h:scripttype="text/javascript"><![CDATA[
      function test() {
dom = ET.fromstring(content)

xslt = '''\
<xsl:stylesheetversion="1.0"xmlns=""xmlns:xsl=""><xsl:outputmethod="xml"indent="no"/><!-- identity transform for everything else --><xsl:templatematch="/|comment()|processing-instruction()|*|@*"><xsl:copy><xsl:apply-templates /></xsl:copy></xsl:template><!-- remove NS from XHTML elements --><xsl:templatematch="*[namespace-uri() = '']"><xsl:elementname="{local-name()}"><xsl:apply-templatesselect="@*|node()" /></xsl:element></xsl:template><!-- remove NS from XHTML attributes --><xsl:templatematch="@*[namespace-uri() = '']"><xsl:attributename="{local-name()}"><xsl:value-ofselect="." /></xsl:attribute></xsl:template></xsl:stylesheet>

xslt_doc = ET.fromstring(xslt)
transform = ET.XSLT(xslt_doc)
dom = transform(dom)

print(ET.tostring(dom, pretty_print = True, 
                  encoding = 'utf-8'))


<htmlxmlns=""><head><title>MathJax Test Page</title><scripttype="text/javascript">functiontest() {

Solution 3:

To expand on @neirbowj's answer, but using ET.Element and ET.SubElement, and rendering a document with a mix of namespaces, where the root happens to be explicitly namespaced and a subelement (channel) is the default namespace:

# I set up but don't use the default namespace:
root = ET.Element('{}RDF', nsmap={None: ''})
# I use the default namespace by including its URL in curly braces:
e = ET.SubElement(root, '{}channel')
print(ET.tostring(root, xml_declaration=True, encoding='utf8').decode())

This will print out the following:

<?xml version='1.0' encoding='utf8'?><rdf:RDFxmlns=""xmlns:rdf=""><channel/></rdf:RDF>

It automatically uses rdf for the RDF namespace. I'm not sure how it figures it out. If I want to specify it I can add it to my nsmap in the root element:

nsmap = {None: '',
         'doge': ''}
root = ET.Element('{}RDF', nsmap=nsmap)
e = ET.SubElement(root, '{}channel')
print(ET.tostring(root, xml_declaration=True, encoding='utf8').decode())

...and I get this:

<?xml version='1.0' encoding='utf8'?><doge:RDFxmlns:doge=""xmlns=""><channel/></doge:RDF>

