Scrapy JSON export issues

By : user141072
Date : November 22 2020, 10:54 AM
Does that help Multiple issues here.
The main problem is in invalid expressions inside the select() calls.
code :
import urlparse

from scrapy.spider import BaseSpider
from scrapy.http.request import Request

from scrapy_demo.items import ScrapyDemoItem

class ScrapyDemoSpider(BaseSpider): 
    name = "scrapy_demo"
    allowed_domains = ["buffalo.craigslist.org"]
    start_urls = ['http://buffalo.craigslist.org/search/cps/']

    def parse(self, response):
        # processing listings
        for listing in response.css('p.row > a[data-id]'):
            link = listing.xpath('@href').extract()[0]
            yield Request(urlparse.urljoin(response.url, link), callback=self.parse_listing_page)

        # following next page
        next_page = response.xpath('//a[contains(@class, "next")]/@href').extract()
        print next_page
        if next_page:
            yield Request(urlparse.urljoin(response.url, next_page[0]), callback=self.parse)

    def parse_listing_page(self, response):
        item = ScrapyDemoItem()
        item['link'] = response.url
        item['title'] = response.xpath('//title/text()').extract()[0].strip()
        item['content'] = response.xpath('//section[@id="postingbody"]/text()').extract()[0].strip()
        yield item
    {"content": "Using a web cam with your computer to video communicate with your loved ones has never been made easier and it's free (providing you have an Internet connection).  With the click of a few buttons, you are sharing your live video and audio with the person you are communicating with. It's that simple.  When you are seeing and hearing your grand kids live across the country or halfway around the world, web camming is the next best thing to being there!", "link": "http://buffalo.craigslist.org/cps/4784390462.html", "title": "Web Cam With Your Computer With Family And Friends"},
    {"content": "Looking to supplement or increase your earnings?", "link": "http://buffalo.craigslist.org/cps/4782757517.html", "title": "1k in 30 Day's"},
    {"content": "Like us on Facebook: https://www.facebook.com/pages/NFB-Systems/514380315268768", "link": "http://buffalo.craigslist.org/cps/4813039886.html", "title": "NFB SYSTEMS COMPUTER SERVICES + WEB DESIGNING"},
    {"content": "Like us on Facebook: https://www.facebook.com/pages/NFB-Systems/514380315268768", "link": "http://buffalo.craigslist.org/cps/4810219714.html", "title": "NFB Systems Computer Repair + Web Designing"},
    {"content": "I can work with you personally and we design your site together (no outsourcing or anything like that!) I'll even train you how to use your brand new site. (Wordpress is really easy to use once it is setup!)", "link": "http://buffalo.craigslist.org/cps/4792628034.html", "title": "I Make First-Class Wordpress Sites with Training"},

Scrapy dynamic creation of objects + json export

By : Simi Olowolafe
Date : March 29 2020, 07:55 AM
around this issue I created a new spider to crawl a website. This crawler get each video game of liste on website and create an object for it : , The solution was simple. Create an object as:
code :
class GameInfo(Item):
    title = Field()
    desc = Field()
    kind = Field()
    listeBuys = Field()
gameInfo = GameInfo()
gameInfo['listeBuys'] = []
Is there a way using scrapy to export each item that is scrapped into a separate json file?

By : user3632636
Date : March 29 2020, 07:55 AM
I think the issue was by ths following , You can use scrapy-pipeline and from there you can insert each item into seperate files.
I have set a counter in my spider so that it increments on each item yield and added that value to item. Using that counter value I'm creating file names.
code :
class TestSpider(Spider):
    # spider name and all
    file_counter = 0

def parse(self, response):
    # your code here

def parse_item(self, response):
     # your code here
     self.file_counter += 1
      item = Testtem(
        #other items, 
     yield item
ITEM_PIPELINES = {'test1.pipelines.TestPipeline': 100}
class TestPipeline(object):

    def process_item(self, item, spider):
        with open('test_data_%s' % item.get('counter'), 'w') as wr:
            item.pop('counter') # remove the counter data, you don't need this in your item
        return item
Scrapy process.crawl() to export data to json

By : Shuuno
Date : March 29 2020, 07:55 AM
I wish this help you This might be a subquestion of Passing arguments to process.crawl in Scrapy python but the author marked the answer (that doesn't answer the subquestion i'm asking myself) as a satisfying one. , You need to specify it on the settings:
code :
process = CrawlerProcess({
    'FEED_URI': 'file:///tmp/export.json',

.json export formating in Scrapy

By : user7978550
Date : March 29 2020, 07:55 AM
Hope this helps In terms of memory usage, it's not a good practice, but an option is to keep an object and write it at the end of the process:
code :
class RautahakuPipeline(object):

    def open_spider(self, spider):
        self.items = { "pages":[] }
        self.file = null # open('items.json', 'w')

    def close_spider(self, spider):
        self.file = open('items.json', 'w')

    def process_item(self, item, spider):            
        return item
class RautahakuPipeline(object):

    def open_spider(self, spider):
        self.file = open('items.json', 'w')
        header='{"pages": ['

    def close_spider(self, spider):

    def process_item(self, item, spider):
        line = json.dumps(dict(item)) + "\n"
        return item
Scrapy: How to export Json from script

By : zuku
Date : March 29 2020, 07:55 AM
hope this fix your issue I created a web crawler with scrapy, but I've a problem with phone number because it is into a script. The script is: , it's simple if you already crawled the contents inside the script tag
code :
import re

script = '{"@context":"http://schema.org","@type":"LocalBusiness","name":"Clínica Dental Reina Victoria 23","description":".TU CLÍNICA DENTAL DE REFERENCIA EN MADRID","logo":"https://estaticos.qdq.com/CMS/directory/logos/c/l/clinica-dental-reina-victoria.png","image":"https://estaticos.qdq.com/coverphotos/098/535/ed1c5ffcf38241f8b83a1808af51a615.jpg","url":"https://www.clinicadental-reinavictoria.es/","hasMap":"https://www.google.com/maps/search/?api=1&query=40.4469174,-3.7087934","telephone":"+34915340309","address":{"@type":"PostalAddress","streetAddress":"Av. Reina Victoria 23","addressLocality":"MADRID","addressRegion":"Madrid","postalCode":"28003"}}'

phone_number = re.search(r'"telephone":"(.*?)","address"', script).group(1)

