All Questions
1,810
questions
-2
votes
1
answer
39
views
xpath issue in nested div
New to python/scrapy. I am testing responses via xpath in the console and am able to print the h1 header as a test using the code below. Now I am trying to select the xpath to pull the (1) job title, (...
0
votes
1
answer
39
views
Scrapy / extracting data across multiple HTML tags
newbie to Scrapy, but catching up fast. One thing I can't figure out, though, despite Googling and Copiloting, so I appreciate your patience :) I have some HTML that looks like this:
<p>
&...
1
vote
1
answer
23
views
How to Extract the Name 'Terence Crawford' from an HTML Segment, Excluding the Span Element?
I am currently facing difficulty retrieving the name 'Terence Crawford' from an HTML segment. The challenge lies in excluding the span element, which is present within the same parent element.
<td ...
0
votes
0
answers
57
views
Scrapy carousel categories not extracting
I am trying to scrap a website to get the list of categories from a carousel but it's not working.
Below is my code
import scrapy
class CourtsmuSpider(scrapy.Spider):
name = "courtsmu"
...
1
vote
1
answer
257
views
Get the Element Name From Attribute Value Using Xpath
I am trying to get the element/tag name of each node where I have a particular attribute value.
I have an xml:
<a node='1'>This</a>
<b node='2'>Is</b>
<c node='23'>A</...
1
vote
2
answers
66
views
Scrapy response returns an empty array
I'm crawling this page with scrapy and I'm trying to extract all the rows of the main table.
The following XPath expression should give me the wanted result:
//div[@id='TableWithRules']//tbody/tr
...
0
votes
0
answers
48
views
My Xpaths don't work in Scrapy Splash, but work in Selenium
I am trying to scrape a list all of the scholarships in the https://bigfuture.collegeboard.org/scholarships/; I was able to scrape all of the links and store it in a list using Selenium. However, ...
0
votes
0
answers
21
views
How to use scrapy to scrape the value of a HTML element tag which is JSON with xpath or in another way?
I using Scrapy to scrape a page and so far I had success with XPATH but I'm in a bit of a struggle with that one. I am trying to get the value of dimensionsImageKey :
<img id="fullViewImg"...
0
votes
1
answer
63
views
Scrapy - xpath returns empty list
I'm scraping restaurant reviews from yelp, specifically from this url
I'm trying to get the list of review containers and, after testing with the chrome console, that would be given by the following ...
0
votes
1
answer
35
views
Scrapy - Only first url in url list is scraped
I'm scraping reviews from restaurants in Rome, Milan and Bergamo. For each one of those cities there's one dedicated url containing 30 or more restaurants. The scraper starts crawling the Rome ...
0
votes
2
answers
56
views
Scrapy XPath - @href returning unexpected value
I'm currently web-scraping restaurant reviews from Tripadvisor and I'm trying to retrieve restaurant links from this page.
I want the links of the 30 restaurant pages in the bottom part but I'm making ...
1
vote
1
answer
74
views
Find string in text of script element
I'm trying to scrape a page where I want to wait until a string has been detected in a script element before returning the page's HTML.
Here's my MRE scraper:
from scrapy import Request, Spider
from ...
-1
votes
2
answers
24
views
I try to get following::* in xpath but the data is repeat whenever it have more than one tagname on it
I try to get all the following data below after some tagname, but the thing is, it will show everything and repeat to show it if it have more than one tagname for example:
<ul>
<li>
<p&...
0
votes
2
answers
74
views
how to get element with xpath by attribute value
trying to get the next page arrow link from this page. however
response.selector.xpath('//a[aria-label="Next page"]') yields []. help! :)
I was able to solve it with response.selector.xpath('...
1
vote
1
answer
35
views
How do you extract tag values and selector with Scrapy? as opposed to the tag content
I have been trying to scrape a site that is not ideally structured. Information within one set of tags is required to understand information in another set of tags, but the second set of tags are not ...