I wish to know what is the scrapy equivalent for response to Requests
r.content
, for example, let's say I have this script:
import requests
import pandas as pd
url = "https://www.example.com"
r = requests.get(url)
pd.read_html(r.content)
would return me a table if the url has an tables. However, what's the equivalent in scrapy?
I have tried:
response.body
response.text
but neither are working for this.
If I try:
pd.read_html(response.content)
I get-
AttributeError: 'HtmlResponse' object has no attribute 'content'
So what is the equivalent so that I can read pandas tables directly from the response?
Tried example:
import scrapy
import pandas as pd
from scrapy.crawl import CrawlerProcess
class GsmSpider(scrapy.Spider):
name = 'gsm'
def start_requests(self):
yield scrapy.Request(
url = "https://www.gsmarena.com/makers.php3",
callback = self.parse
)
def parse(self, response):
data = pd.read_html(response.text)
yield data
process = CrawlerProcess(
settings = {
'FEED_URI':'data.jl',
'FEED_FORMAT':'jsonlines'
}
process.crawl(GsmSpider)
process.start()
pd.read_html(response.text)
should work.ERROR: Spider must return request, item, or None, got 'list'