Read title & authors of a PDF file that contains a research article

Ask Question

Asked 6 months ago

Modified 6 months ago

Viewed 36 times

Is there any way to import PDF file that is a research article and extract only its title, authors, journal, publishing details?

One can import the text and images of a PDF file. But in case of research articles the title, authors & journal information may not be saved under any of the "Elements".
info=Import["file_name.pdf","Elements"]

The purpose is to organize research articles that is not possible using for instance Mendeley.

asked Jan 9 at 20:24

Piotr

$\begingroup$ Welcome to Mathematica StackExchange! PDF files contain metadata, and you can extract them with, for example, Import["file.pdf", "Title"] (see the documentation page for other metadata. However, this will work only if PDF actually contains (correct) metadata. Generally, I would advise to use dedicated reference managing software (there are plenty other than Mendeley), which are sometimes capable to extract information directly from PDF text (or obtain data by querying DOI). $\endgroup$
– Domen
Commented Jan 10 at 10:17
$\begingroup$ If it is a research article, you might imagine using the CrossRef service to look up the required information. Alternatively, you could import the text of the first page and ask your favorite LLM to extract the relevant data and return it in your desired data structure. $\endgroup$
– Joshua Schrier
Commented Jan 10 at 14:06

Add a comment |

Stack Exchange Network

Read title & authors of a PDF file that contains a research article

0

Browse other questions tagged
pdf-format
or ask your own question.

Hot Network Questions

Read title & authors of a PDF file that contains a research article

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged pdf-format or ask your own question.

Related

Hot Network Questions

Browse other questions tagged
pdf-format
or ask your own question.