Python, Mechanize - Request Disallowed By Robots.txt Even After Set_handle_robots And Add_headers

May 25, 2024 Post a Comment

I have made a web crawler which gets all links till the 1st level of page and from them it gets all link and text plus imagelinks and alt. here is whole code: import urllib import

Solution 1:

Ok, so the same problem appeared in this question:

Why is mechanize throwing a HTTP 403 error?

By sending all the request headers a normal browser would send, and accepting / sending back the cookies the server sends should resolve the issue.

Baca Juga

Python Mechanize Select Form Formnotfounderror
Which Is Best In Python: Urllib2, Pycurl Or Mechanize?
Are There Any Alternatives To Mechanize In Python?

Python Manual

Python, Mechanize - Request Disallowed By Robots.txt Even After Set_handle_robots And Add_headers

Solution 1:

Post a Comment for "Python, Mechanize - Request Disallowed By Robots.txt Even After Set_handle_robots And Add_headers"