Scrapinghub

The first step is register in the Scrapinghub service, which can be done at the following URL: https://app.scrapinghub.com/account/login/.

Scrapy Cloud is a platform for running web crawlers and spiders, where spiders are executing in cloud servers and scale on demand: https://scrapinghub.com/scrapy-cloud.

To deploy projects into Scrapy Cloud, you will need the Scrapinghub command-line client, called shub, and it can be installed with the pip install command. You can check if you have the latest version:

$ pip install shub --upgrade

The next step is to create a project in Scrapinghub and deploy your Scrapy project:

When you create a Scrapy Cloud project, you will see information related with API key and the ID on your project's Code & Deploys page:

When spider is deployed, you can go to your project page and schedule or run the spider there:

When you run the spider, you will be redirected to the project dashboard for checking the state of your spider, items, and data extracted. Once the process is finished, the job created will be automatically moved to completed jobs:

We can also see job details where we can see extracted data in the job items section:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset