Spiders

Spiders are classes that define the way to navigate through a specific site or domain and how to extract data from those pages; that is, we define in a personalized way the behavior to analyze the pages of a particular site.

The cycle that follows a spider is the following:

  • First, we start generating the initial request (Requests) to navigate through the first URL and we specify the backward function to be called with the response (Response) downloaded from that request
  • The first request to be made is obtained by calling the start_request() method, which by default generates the request for the specific URL in the start_urls starting addresses and the function of backward for the requests

These requests will be made by downloading by Scrapy and their responses manipulated by the backward functions. In the backward functions, we analyze the content typically using the selectors (XPath selectors) and generate the items with the content analyzed. Finally, the items returned by the spider can be passed to an item pipeline.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset