Beautiful Soup Not Pulling All The Html Of A Webpage
I'm trying to practice using BeautifulSoup. I am trying to pull the image address of football player images from this website: https://www.transfermarkt.com/jordon-ibe/profil/spiel
Solution 1:
The site seems to inspect whether the User-Agent
header of the request is valid.
So you need to add the header like this:
import urllib3
import certifi
url = 'https://www.transfermarkt.com/jordon-ibe/profil/spieler/195652'
http = urllib3.PoolManager(cert_reqs='CERT_REQUIRED', ca_certs=certifi.where())
response = http.request('GET', url, headers={'User-Agent': 'Mozilla/5.0'})
print(response.status)
This prints 200
. If you remove the headers, you get 404
.
Any non-empty User-Agent
value (after trimming whitespace) seems to work.
Post a Comment for "Beautiful Soup Not Pulling All The Html Of A Webpage"