Executing EuroPython spider

We can execute our spider with the following command:

scrapy crawl europython_spider -o europython_items.json -t json

At the end of the process, we obtain the following as output files:

  • europython_items.json
  • europython_items.xml
  • europython.sqlite

Each of these files are generated in the classes that are defined in the pipelines.py file and the JSON file is generated automatically by the spider.

Another interesting option is that spiders can manage arguments that are passed in the crawl command using the -a option. For example, the following command will extract the data of the sessions of the EuroPython 2018 from the following URL: http://ep2018.europython.eu/en/events/sessions:

scrapy crawl europython_spider -a year=2018 -o europython_items.json -t json

In this screenshot, we can see the JSON file generated after the execution of the previous command:

Also, we can see that it generates a SQLite file that we can open with the SQLite browser tool and see the structure of the generated table:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset