I have come across a website whose webpages consist solely of JavaScript. This website hosts videos that I would like to mirror. When I open this website in Firefox and read the page source, I see the JavaScript imports in script tags. When I inspect the page in Firefox, I see HTML, including a video tag. I presume the JavaScript has generated that HTML. (I have not programmed JavaScript, so I do not know its intricacies.)
How do I go about expanding the JavaScript, after downloading the original page? I am after a command-line tool that does this. The resultant HTML is to be parsed by regex to extract the video source file name. My current idea of a bash script that grabs the video is structured like this:
wget the.website.com/page/of/javascript/;
cat inThePage.html |
executeJavaScriptAndBuildHTML |
sed "the HTML and extract the video file name" |
while read aVideoFileName; do
wget $aVideoFileName;
done
I wonder if such a tool exists; usually JavaScript is executed in the context of an entire GUI web browser.
I have an inkling that the whole point of the JavaScript-only page is to prevent this kind of automated downloading.