Scrapy : Storing The Data

August 28, 2023 Post a Comment

I'm new with python and scrapy. I'm tring to follow the Scrapy tutorial but I don't understand the logic of the storage step. scrapy crawl spidername -o items.json -t json scrapy

Solution 1:

You can view a list of available commands by typing scrapy crawl -h from within your project directory.

scrapy crawl spidername -o items.json -t json

-o specifies the output filename for dumped items (items.json)
-t specifies the format for dumping items (json)

scrapy crawl spidername --set FEED_URI=output.csv --set FEED_FORMAT=csv

--set is used to set/override a setting
FEED_URI is used to set the storage backend for the item dumping. In this instance it is set to "output.csv" which is using the local filesystem ie a simple output file.(for current example - output.csv)
FEED_FORMAT is used to set the serialization format for the (output) feed ie (for current example csv)

References (Scrapy documentation):

Solution 2:

--set

Arguments provided by the command line are the ones that take precedence, overriding any other options.

You can explicitly override one (or more) settings using the -s (or --set) command line option.

Example:

    scrapy crawl myspider -s LOG_FILE=scrapy.log

    sets the LOG_FILE settings value to `scrapy.log`

-o

Specifies the output filename and extension WHERE you will write the scraped data to

Examples: 
    scrapy crawl quotes -o items.csv
    scrapy crawl quotes -o items.json
    scrapy crawl quotes -o items.xml

-t

Specifies the serialisation format or HOW the items are written

https://www.tutorialspoint.com/scrapy/scrapy_settings.htm

Python Manual

Scrapy : Storing The Data

Solution 1:

Solution 2:

Post a Comment for "Scrapy : Storing The Data"