Creating a project with Scrapy

Before starting with Scrapy, you have to start a project where you want to store your code. To create a project with Scrapy, you have to execute the command from the console:

scrapy startproject helloProject

This command will create a helloProject directory with the following contents:

helloProject/
scrapy.cfg # deploy configuration file
helloProject/ # project's Python module, you'll import your code from here
__init__.py
items.py # project items file
pipelines.py # project pipelines file
settings.py # project settings file
spiders/ # a directory where you'll later put your spiders
__init__.py

Each project consists of the following:

  • items.py: We define the elements to extract
  • spiders: The heart of the project, here we define the extraction procedure
  • Pipelines.py: The elements to analyze what has been obtained—data validation and cleaning of HTML code

Once the project is created, we have to define the items that we want to extract, or rather the class where the data extracted by scrapy will be stored. Basically, in items.py we create the fields of the information that we are going to extract.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset