0

I am familiar with how this tool can extract data from tables on websites like Wikipedia, but I've run into a few issues. Here is what I need to happen, if possible with Excel, or any other software:

Example Spreadsheet Results

That was accomplished with a simple Copy & Paste to Match Destination Format, but that won't keep it dynamic. When I use the query editor, I immediately notice that it won't work the way I'd like it to unless the tables are actual HTML tables. It is possible to drill down until I find text, but I have no viable method to understand where I'm going.

Usual result

There is no option available to select regions containing data that I'd like to be in a table, unless it fits the aforementioned format. Also, some pages that would normally allow it, have an IE compatibility issue that I'm not sure how to circumvent. If I could use Chrome or Edge instead that would be helpful.

1 Answer 1

0

This tool doesnt work well on most modern commercial sites as their pages are incredibly complex and dynamic. For example they will try to detect your location and browser and serve you varying content. Most other tools also struggle on those pages.

You will get better results by extracting from RSS feeds, rather than trying to parse the entire home page. RSS is a stable, machine-readable standard for news sites to publish stories/items. E.g. for Reuters they have many feeds available from:

https://www.reuters.com/tools/rss

Use the URL for the one you want and Power Query will quickly give you a nested document. Click on the Table cells to drill down, e.g. to content and then to item.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .