Executing Scrapy

With Scrapy, we can collect the information and save it in a file in one of the supported formats (XML, JSON, or CSV), or even directly in a database using a pipeline. In this case, we are executing the scrapy command passing as argument the JSON format:

$ scrapy crawl <crawler_name> -o items.json -t json

The last parameters indicate that the extracted data is stored in a file called items.json and that the exporter uses for JSON format. It can be done in the same way to export to CSV and XML formats.

The option -o items.csv provides as a parameter the name of the output file that will contain the data you have extracted. You can also extract information as JSON format by using the -t json option.

With the -t csv option, we will obtain a CSV file with the crawling process result:

$ scrapy crawl <crawler_name> -o items.csv -t csv

With the -t json option, we will obtain a JSON file with the crawling process result:

$ scrapy crawl <crawler_name> -o items.json -t json

With the -t xml option, we will obtain an XML file with the crawling process result:

$ scrapy crawl <crawler_name> -o items.xml -t xml

The runspider command tells scrapy to run your spider from your spider template:

$ scrapy runspider spider-template.py

Table of Contents for Executing Scrapy

Create new playlist

Sign In

Sign Up

Table of Contents for
Executing Scrapy