get all sections separately with wikimedia api

Question

I try to get all seperate sections of a wikipedia article through the api.

I know already :

Howto retrieve a complete text :

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvlimit=1&titles=house&rvprop=content
Howto retrieve a specific section of the text:

http://en.wikipedia.org/w/api.php?action=query&prop=revisions&rvlimit=1&titles=house&rvprop=content&rvsection=0

Howto retrieve all sections seperately with one request ? (JSON Array for example)

Nemo · Accepted Answer · 2014-12-10 15:16:45Z

8

What you ask is called parsing, because it requires interpretation of the wikitext source to split the page by sections etc. So the solution is given in https://www.mediawiki.org/wiki/API:Parsing_wikitext

1) Get the list of sections: https://www.mediawiki.org/w/api.php?action=parse&page=API:Parsing_wikitext&prop=sections

2) Ask the parsed wikitext of that section: https://www.mediawiki.org/w/api.php?action=parse&page=API:Parsing_wikitext&section=1&prop=text

answered Dec 10, 2014 at 15:16

Nemo

2,5142 gold badges31 silver badges63 bronze badges

1

Thanks for the answer. How would you get this data in plaintext? contentformat flag doesn't seem to work. Any other way?
– Vinay W
Commented May 31, 2018 at 7:37
@VinayWadhwa that's handled by a separate parsing API (linked from the document above), TextExtracts.
– Nemo
Commented May 31, 2018 at 10:31

Add a comment |

mydoghasworms · Accepted Answer · 2019-01-09 06:19:16Z

I realize this question was asked four years ago, so possibly the following was not available then:

You can use the REST API described here: https://www.mediawiki.org/wiki/REST_API

The REST endpoints are described/documented here: https://en.wikipedia.org/api/rest_v1/#/

The mobile-sections endpoint (intended for consuming info for a mobile device) gives you a nice breakdown with headings, which sounds like what you are asking for.

Alternatively, the metadata endpoint returns a toc (table of contents) section which contains the same breakdown of headings.

Here is an example URL, fetching the mobile sections for the "Egyptian pyramids" page: https://en.wikipedia.org/api/rest_v1/page/mobile-sections/Egyptian_pyramids

The advantage is that the response is in JSON format (which is what you were asking for).

Can we get a list of all the links available under the 'See Also' section? — Abhi, Commented May 2, 2021 at 5:33

chipfall · Accepted Answer · 2023-04-27 19:12:45Z

As has been pointed out, you can't do it all on one request but you can do it all in one line, broken out:

curl -s "http://myserver/mywiki/api.php?action=parse&format=json&page=Testpage&prop=sections" |\
 jq -r '.parse.sections[] | .index' |\
 xargs -I {} -n 1  curl -s "http://myserver/mywiki/api.php?action=parse&page=Testpage&format=json&prop=wikitext&section={}" |\
 jq '.parse.wikitext."*"' | xargs -I {} -0 -n 1 echo -e {}

explanation:

curl -s keeps it quiet. you may need -k for https
the first jq grabs the indexes of the returned array, aka sections
we use xargs to grab each section as json
and pass that back to get the wikitext of each section
finally passing each to echo -e to interpret escapes
the -0 stops some metacharacter being interpreted by xargs

This of course does not look much different from grabbing the page but you can change the first jq slightly to

jq -r ".parse.sections[] | select(.line == \"$section\") | .index"

and limit to one section. You did not ask this but it's a useful as a poor man's supplement to man pages. Written as a bash function one could recall a specifically named condensed section of a larger self-linked page at the command line. Man doesn't cover everything and it's been around since the start of Unix exactly because no one can remember everything and get it right, especially not chatGPT. Thanks Nemo for your original answer.

Collectives™ on Stack Overflow

get all sections separately with wikimedia api

3 Answers 3

Not the answer you're looking for? Browse other questions tagged
api
mediawiki
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Not the answer you're looking for? Browse other questions tagged apimediawiki or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
api
mediawiki
or ask your own question.