Python, Mechanize - Request Disallowed By Robots.txt Even After Set_handle_robots And Add_headers
I have made a web crawler which gets all links till the 1st level of page and from them it gets all link and text plus imagelinks and alt. here is whole code: import urllib import
Solution 1:
Ok, so the same problem appeared in this question:
Why is mechanize throwing a HTTP 403 error?
By sending all the request headers a normal browser would send, and accepting / sending back the cookies the server sends should resolve the issue.
Post a Comment for "Python, Mechanize - Request Disallowed By Robots.txt Even After Set_handle_robots And Add_headers"