How To Parse Raw Http Request In Python 3?
I am looking for a native way to parse an http request in Python 3. This question shows a way to do it in Python 2, but uses now deprecated modules, (and Python 2) and I am looking
Solution 1:
You could use the email.message.Message
class from the email
module in the standard library.
By modifying the answer from the question you linked, below is a Python3 example of parsing HTTP headers.
Suppose you wanted to create a dictionary containing all of your header fields:
import email
import pprint
from io import StringIO
request_string = 'GET / HTTP/1.1\r\nHost: localhost\r\nConnection: keep-alive\r\nCache-Control: max-age=0\r\nUpgrade-Insecure-Requests: 1\r\nUser-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36\r\nAccept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8\r\nAccept-Encoding: gzip, deflate, sdch\r\nAccept-Language: en-US,en;q=0.8'# pop the first line so we only process headers
_, headers = request_string.split('\r\n', 1)
# construct a message from the request string
message = email.message_from_file(StringIO(headers))
# construct a dictionary containing the headers
headers = dict(message.items())
# pretty-print the dictionary of headers
pprint.pprint(headers, width=160)
if you ran this at a python prompt, the result would look like:
{'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, sdch',
'Accept-Language': 'en-US,en;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Host': 'localhost',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'}
Solution 2:
Each one of those field names should be delimited by carriage return then newline, and then the field name and value are delimited by a colon. So assuming you already have the response as a string, it should be as easy as:
fields = resp.split("\r\n")
fields = fields[1:] #ignore the GET / HTTP/1.1
output = {}
for field in fields:
key,value = field.split(':')#split each line by http field name and value
output[key] = value
Update 4/13
Using the example http resp in the linked to post:
resp = 'GET /search?sourceid=chrome&ie=UTF-8&q=ergterst HTTP/1.1\r\nHost: www.google.com\r\nConnection: keep-alive\r\nA
ccept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5\r\nUser-Agent: Mozill
a/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.45 Safari/534.
13\r\nAccept-Encoding: gzip,deflate,sdch\r\nAvail-Dictionary: GeNLY2f-\r\nAccept-Language: en-US,en;q=0.8\r\n'
fields = resp.split("\r\n")
fields = fields[1:] #ignore the GET / HTTP/1.1
output = {}
for field in fields:
if not field:
continue
key,value = field.split(':')
output[key] = value
print(output)
An additional check to make sure field
is not empty is needed. OUtput:
{'Host': ' www.google.com', 'Connection': ' keep-alive', 'Accept': ' application/xml,application/xhtml+xml,text/html;q=
0.9,text/plain;q=0.8,image/png,*/*;q=0.5', 'User-Agent': ' Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) App
leWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.45 Safari/534.13', 'Accept-Encoding': ' gzip,deflate,sdch', 'Avail-D
ictionary': ' GeNLY2f-', 'Accept-Language': ' en-US,en;q=0.8'}
Post a Comment for "How To Parse Raw Http Request In Python 3?"