0
$\begingroup$

Is there any way to import PDF file that is a research article and extract only its title, authors, journal, publishing details?

One can import the text and images of a PDF file. But in case of research articles the title, authors & journal information may not be saved under any of the "Elements".
info=Import["file_name.pdf","Elements"]

The purpose is to organize research articles that is not possible using for instance Mendeley.

$\endgroup$
2
  • $\begingroup$ Welcome to Mathematica StackExchange! PDF files contain metadata, and you can extract them with, for example, Import["file.pdf", "Title"] (see the documentation page for other metadata. However, this will work only if PDF actually contains (correct) metadata. Generally, I would advise to use dedicated reference managing software (there are plenty other than Mendeley), which are sometimes capable to extract information directly from PDF text (or obtain data by querying DOI). $\endgroup$
    – Domen
    Commented Jan 10 at 10:17
  • $\begingroup$ If it is a research article, you might imagine using the CrossRef service to look up the required information. Alternatively, you could import the text of the first page and ask your favorite LLM to extract the relevant data and return it in your desired data structure. $\endgroup$ Commented Jan 10 at 14:06

0

Browse other questions tagged or ask your own question.