Crawl And Scrape A Complete Site With Scrapy

August 09, 2024 Post a Comment

import scrapy from scrapy import Request #scrapy crawl jobs9 -o jobs9.csv -t csv class JobsSpider(scrapy.Spider): name = 'jobs9' allowed_domains = ['vapedonia.com'] start_urls = [

Solution 1:

You need to use a CrawlSpider with rules in this case. Below is a simple translated one of your scraper

classJobsSpider(scrapy.spiders.CrawlSpider):
    name = "jobs9"
    allowed_domains = ["vapedonia.com"]
    start_urls = ["https://www.vapedonia.com"]

    rules = (Rule(LinkExtractor(allow=(r"https://www.vapedonia.com/\d+.*",)), callback='parse_category'), )

    defparse_category(self, response):
        products = response.xpath('//div[@class="product-container clearfix"]')
        for product in products:
            image = product.xpath('div[@class="center_block"]/a/img/@src').extract_first()
            link = product.xpath('div[@class="center_block"]/a/@href').extract_first()
            name = product.xpath('div[@class="right_block"]/p/a/text()').extract_first()
            price = product.xpath(
                'div[@class="right_block"]/div[@class="content_price"]/span[@class="price"]/text()').extract_first().encode(
                "utf-8")
            yield {'Image': image, 'Link': link, 'Name': name, 'Price': price}

Look at different spiders on https://doc.scrapy.org/en/latest/topics/spiders.html

Baca Juga

Python Manual

Crawl And Scrape A Complete Site With Scrapy

Solution 1:

Post a Comment for "Crawl And Scrape A Complete Site With Scrapy"