0

To extract urls from a site, it is usually enough to run:

lynx -dump -listonly https://soundcloud.com/grubstakers > urls.txt

But I get only the latest episodes, instead of the urls of all of them (along some spurious urls).

Is it possible to do this with the lynx browser or is javascript responsible for loading the rest of the links when we scroll down in a GUI browser?

1 Answer 1

0

You can use something like this:

https://api-v2.soundcloud.com/stream/users/394696287?client_id=qWUPqUOvYPTG1SDjwXJCNm9gOwM3rNeP&limit=200

That returns 146 entries, I believe thats all they have currently. For more productive artists, youll need to use pagination. Here is an example with PHP, but you can do this with any language supporting HTTP and JSON:

<?php
$s1 = 'https://api-v2.soundcloud.com/stream/users/394696287';
$s2 = http_build_query([
   'client_id' => 'qWUPqUOvYPTG1SDjwXJCNm9gOwM3rNeP',
   'limit' => 200
]);
$s3 = file_get_contents($s1 . '?' . $s2);
$o1 = json_decode($s3);
foreach ($o1->collection as $o2) {
   echo $o2->track->permalink_url, "\n";
}
2
  • How would you proceed with this link? How would you extract the urls from there? Commented Feb 22, 2020 at 19:53
  • Thanks for the update. But how do you get he initial url given the url of the soundcloud page? Commented Feb 23, 2020 at 8:51

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .