Skip to content Skip to sidebar Skip to footer

Python, Mechanize - Request Disallowed By Robots.txt Even After Set_handle_robots And Add_headers

I have made a web crawler which gets all links till the 1st level of page and from them it gets all link and text plus imagelinks and alt. here is whole code: import urllib import

Solution 1:

Ok, so the same problem appeared in this question:

Why is mechanize throwing a HTTP 403 error?

By sending all the request headers a normal browser would send, and accepting / sending back the cookies the server sends should resolve the issue.

Post a Comment for "Python, Mechanize - Request Disallowed By Robots.txt Even After Set_handle_robots And Add_headers"