Python - Get Header Information From Url

December 27, 2023 Post a Comment

I've been searching all around for a Python 3.x code sample to get HTTP Header information. Something as simple as get_headers equivalent in PHP cannot be found in Python easily. O

Solution 1:

To get an HTTP response code in python-3.x, use the urllib.request module:

>>>import urllib.request>>>response =  urllib.request.urlopen(url)>>>response.getcode()
200
>>>if response.getcode() == 200:...print('Bingo')... 
Bingo

The returned HTTPResponse Object will give you access to all of the headers, as well. For example:

>>> response.getheader('Server')
'Apache/2.2.16 (Debian)'

If the call to urllib.request.urlopen() fails, an HTTPErrorException is raised. You can handle this to get the response code:

import urllib.request
try:
    response = urllib.request.urlopen(url)
    if response.getcode() == 200:
        print('Bingo')
    else:
        print('The response code was not 200, but: {}'.format(
            response.get_code()))
except urllib.error.HTTPError as e:
    print('''An error occurred: {}
The response code was {}'''.format(e, e.getcode()))

Solution 2:

For Python 2.x

urllib, urllib2 or httplib can be used here. However note, urllib and urllib2 uses httplib. Therefore, depending on whether you plan to do this check a lot (1000s of times), it would be better to use httplib. Additional documentation and examples are here.

Example code:

import httplib
try:
    h = httplib.HTTPConnection("www.google.com")
    h.connect()
except Exception as ex:
    print"Could not connect to page."

For Python 3.x

A similar story to urllib (or urllib2) and httplib from Python 2.x applies to the urllib2 and http.client libraries in Python 3.x. Again, http.client should be quicker. For more documentation and examples look here.

Example code:

import http.client

try:
    conn = http.client.HTTPConnection("www.google.com")
    conn.connect()    
except Exception as ex:
    print("Could not connect to page.")

and if you wanted to check the status codes you would need to replace

conn.connect()

with

conn.request("GET", "/index.html")  # Could also use "HEAD" instead of "GET".
res = conn.getresponse()
if res.status == 200or res.status == 302:  # Specify codes here.
    print("Page Found!")

Note, in both examples, if you would like to catch the specific exception relating to when the URL doesn't exist, rather than all of them, catch the socket.gaierror exception instead (see the socket documentation).

Solution 3:

You can use requests module to check it:

import requests
url = "http://www.example.com/"
res = requests.get(url)
if res.status_code == 200:
    print("bingo")

You can also check header contents before making downloading the whole content of the webpage by using header.

Solution 4:

you can use the urllib2 library

import urllib2
if urllib2.urlopen(url).code == 200:
    print"Bingo"

Python Manual