Crawl And Scrape A Complete Site With Scrapy
import scrapy from scrapy import Request #scrapy crawl jobs9 -o jobs9.csv -t csv class JobsSpider(scrapy.Spider): name = 'jobs9' allowed_domains = ['vapedonia.com'] start_urls = [
Solution 1:
You need to use a CrawlSpider with rules in this case. Below is a simple translated one of your scraper
classJobsSpider(scrapy.spiders.CrawlSpider):
name = "jobs9"
allowed_domains = ["vapedonia.com"]
start_urls = ["https://www.vapedonia.com"]
rules = (Rule(LinkExtractor(allow=(r"https://www.vapedonia.com/\d+.*",)), callback='parse_category'), )
defparse_category(self, response):
products = response.xpath('//div[@class="product-container clearfix"]')
for product in products:
image = product.xpath('div[@class="center_block"]/a/img/@src').extract_first()
link = product.xpath('div[@class="center_block"]/a/@href').extract_first()
name = product.xpath('div[@class="right_block"]/p/a/text()').extract_first()
price = product.xpath(
'div[@class="right_block"]/div[@class="content_price"]/span[@class="price"]/text()').extract_first().encode(
"utf-8")
yield {'Image': image, 'Link': link, 'Name': name, 'Price': price}
Look at different spiders on https://doc.scrapy.org/en/latest/topics/spiders.html
Post a Comment for "Crawl And Scrape A Complete Site With Scrapy"