Skip to content Skip to sidebar Skip to footer

How To Tell Beautifulsoup To Extract The Content Of A Specific Tag As Text? (without Touching It)

I need to parse an html document which contains 'code' tags I'm getting the code blocks like this: soup = BeautifulSoup(str(content)) code_blocks = soup.findAll('code') The proble

Solution 1:

Add the code tag to the QUOTE_TAGS dictionary.

from BeautifulSoup import BeautifulSoup

content = "<code class='csharp'>List<Person> persons = new List<Person>();</code>"

BeautifulSoup.QUOTE_TAGS['code'] = None
soup = BeautifulSoup(str(content))
code_blocks = soup.findAll('code')

Output:

[<codeclass="csharp"> List<Person> persons = new List<Person>(); </code>]

Post a Comment for "How To Tell Beautifulsoup To Extract The Content Of A Specific Tag As Text? (without Touching It)"