Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Creating our spider

This is the code for our first spider. Save it in a file named MySpider.py under the spiders directory in your project:

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.linkextractors.lxmlhtml import LxmlLinkExtractor
from scrapy.selector import HtmlXPathSelector
from scrapy.item import Item

class MySpider(CrawlSpider):
    name = 'example.com'
    allowed_domains = ['example.com']
    start_urls = ['http://www.example.com']
    rules = (Rule(LxmlLinkExtractor(allow=())))

    def parse_item(self, response):
        hxs = HtmlXPathSelector(response)
        element = Item()
        return element

CrawlSpider provides a mechanism that allows you to follow the links that follow a certain pattern. Apart from the inherent attributes of the BaseSpider class, this class has a new rules attribute with which we can indicate to the spider the behavior that it should follow.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for Creating our spider

Create new playlist

Sign In

Sign Up

Table of Contents for
Creating our spider