logo
down
shadow

Scrapy: Data retrieved with xpath in shell but not in item


Scrapy: Data retrieved with xpath in shell but not in item

By : user2953528
Date : November 21 2020, 01:01 AM
wish helps you I am building a simple web scraper using scrapy to get the results of a football team from the BBC website. The relevant HTML from the page (http://www.bbc.com/sport/football/teams/bolton-wanderers/results) is this: , Shouldn't it be
code :
game['date'] = sel.xpath('td[@class="match-date"]/text()').extract()
game['date'] = response.xpath('td[@class="match-date"]/text()').extract()
for sel in response.xpath('//tr[@class="report"]'):


Share : facebook icon twitter icon
Scrapy shell XPATH not working

Scrapy shell XPATH not working


By : Rajeswari Priyanka
Date : March 29 2020, 07:55 AM
will help you Scrapy is not able to "parse" sites that need Javascript execution. What different developer consoles show you is the already interpreted and executed site with all Javascripts applied.
Since Google displays its resulst with the help of Javascript, the Scrapy on its own can't handle this.
Scrapy Shell XPath

Scrapy Shell XPath


By : user2977726
Date : March 29 2020, 07:55 AM
wish help you to fix your issue I am trying to get links and category from this http://www.npr.org/rss/#feeds news feed website. , This is because of the first link in the results:
code :
<a class="iconlink xml" href="/rss/rss.php?id=1001" target="blank"><strong>News Headlines</strong></a>
//ul[@class="rsslinks"]/li/a//text()
                         HERE^
XPath expression not working in scrapy spider but it is scrapy shell

XPath expression not working in scrapy spider but it is scrapy shell


By : Pete Ottoson
Date : March 29 2020, 07:55 AM
seems to work fine I'm working on a small project to get my head around scrapy and I've come across a problem with my xpath. , You forgot to return the item from the parse() callback:
code :
def parse(self, response):
    item = PhoneScraperItem()
    item['price'] = response.xpath('//div[@class="listing-content"]//meta[@itemprop="price"]/@content').extract()
    return item
Populating data with scrapy's item loader works in shell but not in spider

Populating data with scrapy's item loader works in shell but not in spider


By : user1904192
Date : March 29 2020, 07:55 AM
hope this fix your issue You are using CrawlSpider incorrectly.
If you want to crawl a single product just stick to original Spider base class:
code :
class MySpider(Spider):
    #          ^^^^^^
    name = 'zooplus'
    allowed_domains = ['zooplus.fr']
    start_urls = [
        'https://www.zooplus.fr/shop/chats/aliments_specifiques_therapeutiques_chat/problemes_urinaires_renaux_chat/croquettes_therapeutiques_chat/595867',
    ]

    def parse(self, response):
    #   ^^^^^
        l = ItemLoader(item=dict(), response=response)
        l.add_xpath('brand', '//*[@id="js-breadcrumb"]/li[4]/a/span/text()')
        l.add_xpath('name', '//*[@id="js-product__detail"]/div[1]/div[2]/div[1]/h1/text()')
        l.add_xpath('description', '//*[@id="js-product__detail"]/div[1]/div[2]/div[1]/div[1]/meta/@content')
        l.add_value('url', response.url)
        l.add_value('last_updated', 'today')
        return l.load_item()
When using scrapy shell, I get no data from response.xpath

When using scrapy shell, I get no data from response.xpath


By : user2947195
Date : March 29 2020, 07:55 AM
Does that help i try scraping the site just using Scrapy and this is my result.
This the items.py file
code :
    import scrapy

    class LifeMatchsItem(scrapy.Item):

        Event = scrapy.Field() # Name of event
        Match = scrapy.Field() # Teams1 vs Team2
        Date = scrapy.Field()  # Date of Match


    import scrapy
    from LifeMatchesProject.items import LifeMatchsItem


    class LifeMatchesSpider(scrapy.Spider):
        name = 'life_matches'
        start_urls = ['http://www.betfair.com/sport/home#sscpl=ro/']

        custom_settings = {'FEED_EXPORT_ENCODING': 'utf-8'}

        def parse(self, response):
            for event in response.xpath('//div[contains(@class,"events-title")]'):
                for element in event.xpath('./following-sibling::ul[1]/li'):
                    item = LifeMatchsItem()
                    item['Event'] = event.xpath('./a/@title').get()
                    item['Match'] = element.xpath('.//div[contains(@class,"event-name-info")]/a/@data-event').get()
                    item['Date'] = element.xpath('normalize-space(.//div[contains(@class,"event-name-info")]/a//span[@class="date"]/text())').get()
                    yield item

shadow
Privacy Policy - Terms - Contact Us © ourworld-yourmove.org