Questions tagged [beautifulsoup]
Beautiful Soup is a Python package for parsing HTML/XML. The latest version of this package is version 4, imported as bs4.
beautifulsoup
32,806
questions
-1
votes
0
answers
15
views
Crawl data in Top 250 Movies IDMb
Please, i need someone help me. I can't understand why I only crawl 25 movies instead of 250. My code:
import pandas as pd
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': '...
-2
votes
1
answer
42
views
Scraping Wholefoods Amazon resulting in 200 none [closed]
When scraping code it results in following error:
https://www.amazon.com:443 "GET /s?k=Chicken&i=wholefoods&disableAutoscoping=true HTTP/1.1" 200 None
urls = {
'Whole Foods ...
0
votes
0
answers
14
views
Scraping Amazon Shopping Cart in Real time via a chrome extension
I am trying to build a chrome extension that in real time scrapes the amazon website as the user is on it and gets us the subtotal in the user's cart before they proceed to checkout. We will use this ...
0
votes
0
answers
44
views
BeautifulSoup works when running code from the python command line, but not when running a python script from a shell script
I am interested in extracting data added to a database each day using a cronjob. The cronjob is supposed to run at hourly increments to extract the data by executing a shell script that then runs a ...
-4
votes
0
answers
28
views
extracting skills requirement from linkedin posted jobs [closed]
enter image description here
in the image we can see its tag name and tag attributes but i am unable to extract it.
I have tried below and several other possible tags and attributes but still getting ...
-3
votes
0
answers
26
views
Integrating web scraping and LLMs [closed]
I wanted to extract some information about a specific drug (lets say Rolvedon) from this site.
I tried using BeautifulSoup and Scrapy but they seem to be very format dependent. I want the code to be ...
0
votes
1
answer
66
views
Why does my list inconsistently print to different lengths every time I run the code when I don't change anything?
I'm currently trying to scrape this website using BeautifulSoup in Pycharm to sort all the articles from most upvotes to least upvotes: https://news.ycombinator.com/news
I have successfully parsed the ...
-1
votes
1
answer
62
views
How I do scraping in a web page that show information with scroll and not with index?
I'm learning web scraping, and i'm trying to get data from a page that show information with scroll, What can I do in this scenario?, Is there a function to make the entire page load? I am using ...
1
vote
1
answer
40
views
How to get image src of carousel posts when scraping Instagram with Selenium
I'm trying to scrape Instagram photos with Selenium. The script is working to get the first image of all types of posts (single, video, carousels) but when I try to get the src of any subsequent ...
0
votes
1
answer
31
views
Trying to build a python API, but getting this error for Quart library as not found
Exception has occurred: ModuleNotFoundError
No module named 'quart'> File Path "C:\xxxxxS\xx\xxxl\NewProjects\xxxx\my_api_env\app.py", line 3, in
from quart import Quart, request, ...
1
vote
0
answers
51
views
Response ended prematurely while scrapping web page inside cronjob
I created Cronjob to execute the Script every 24 hours, I noticed that this error occurs when the code compiles itself during the cron process on the local machine I did not notice this problem.
...
0
votes
2
answers
52
views
web scraper is not grabbing desired text
I am trying to scrape the sku and description on this site:
https://www.milwaukeetool.com/products/power-tools/drilling/drill-drivers
but, it wont scrape the desired elements despite the code being ...
0
votes
1
answer
46
views
Selenium WebDriverWait try/finally statement fails even if expected condition is met
I'm following the documentation on the Selenium website for how to wait for Ajax responses before proceeding, and while the correct dynamically loaded information is found, a timeout error is still ...
-2
votes
2
answers
41
views
Extract an Image from a Web Page
Every day, I need to manually extract the central image from two URLs. I decided to automate this process and, with the help of ChatGPT, I have the following code
# %%
from datetime import datetime, ...
-2
votes
0
answers
23
views
Extract data from StackOverflow Enterprise [closed]
I want to extract question and answer from PayPal stackoverflow. So for that I was thinking to use a API and extract the data. But here I am not sure how to login when using the GET request. I am ...