12

Recently, I've come across various AI tools that claim to assist in conducting literature reviews by quickly summarizing research papers, identifying key themes, and even suggesting relevant literature.

While these AI tools appear to be promising in terms of efficiency and comprehensiveness, I'm concerned about their reliability and the potential impact on the quality of the literature review. My questions are:

  1. Is it advisable to use AI tools for literature reviews in academic research?
  2. How do these tools compare to traditional methods in terms of accuracy and depth of analysis?
5
  • 10
    IMHO, reviews published in journals should only be written by people who are experts on the topic (who probably don't need AI). Otherwise the review is likely to be unreliable (at best) and likely to mislead students and early career practitioners. My view of these AI tools in general is that they are fine for automating the drudgery of things you already know how to do well, but you can't rely on the accuracy of what they produce, so you need to be expert enough to safely validate it yourself. If it is for a student report or dissertation, this is less of a consideration. Commented Jan 17 at 11:49
  • 3
    If you are able to successfully use AI to do any meaningful literature review, please let us know how. I have tried this (for theoretical computer science / formal methods), with no success. Commented Jan 17 at 16:03
  • By "AI", do you specifically mean language models, or AI in general? Because auto-summarization and semantic analysis are things, and the answer will be (somewhat) different if that's what you mean.
    – Ray
    Commented May 7 at 16:31
  • I meant AI in general. @Ray Commented Jun 8 at 13:48
  • @ImanMohammadi Then Ben's answer is right. Autosummarization and keyword extraction could probably be used to filter a large number of papers into a shortlist of papers that are more likely to be relevant. (Whether any particular tool is better at that than existing search engines is another question, but there's no technical reason why they couldn't be.)
    – Ray
    Commented Jun 8 at 21:41

9 Answers 9

50

You can use any tool you want to identify papers, but you have to read them yourself

It really depends on what you propose to use an AI tool for. To write a literature review you want to (a) identify relevant papers; and (b) read those papers and summarise them in an appropriate way. An AI tool might potentially be useful in identifying relevant papers, but it is not a substitute for reading those papers yourself. Here it is notable that AI tools have a history of "hallucination" whereby they may incorrectly summarise a resource or even invent a resource that does not actually exist.

6
  • 27
    So basically, use AI tools as search engines, not as copywriters.
    – vsz
    Commented Jan 16 at 11:16
  • 2
    @vsz - and at that point one should just start with a good research librarian and an appropriate literature search engine, not ChatGPT and the like.
    – Jon Custer
    Commented Jan 16 at 14:07
  • 16
    I currently treat AI tools like that they way we were always warned to use Wikipedia back in the early days - You can't trust it to not be completely invented, even thought it may look entirely plausible, so use it as a precompiled starting point but verify everything before you rely on it. As @Jon Custer said, at that point it really only makes sense to use if it's your best available starting resource. That may indeed be the case, but caveat emptor
    – Nick J
    Commented Jan 16 at 15:09
  • 6
    @NickJ : This is a good practice for Wikipedia even today, especially around topics which generate lots of political and ideological interest.
    – vsz
    Commented Jan 16 at 15:23
  • 8
    @vsz That's a good practise. Period. If you want to rely on something, veryify it. No need to specify that that's exclusive to Wikipedia and LLM-generated content, it's just as true for a research articles (who wrote it and what are their interesests? Does it align with other articles?). Obviousy it depends on 'the thing' and where it came from how much effort should go in verifying it (for example Wikipedia is statistically more trustworthy than most sources, but it still deserves verification, and LLM-generated content is significantly less trustworthy). Commented Jan 16 at 15:43
23

To be brief, if used appropriately, AI tools are fantastic and can be used to improve the speed of development and the quality of literature reviews. But if used inappropriately, AI tools can speedily lead you into plagiarism and academic fraud with all their disastrous consequences.

I teach how to (and how not to) use AI tools for academic research and publications, including specifically for literature reviews. I will summarize here some key points from a presentation I gave on this topic:

You can refer to the slides and recording for details.

First of all, it is crucial to generally understand when it is appropriate or not to use generative AI (not just for academic research, but in general). Here's a framework I have adapted from James Chapman to summarize some key considerations:

Flowchart: should you use generative AI?

Edit: the flowchart has been reconfigured based on comments by @BryanKrause.

For literature reviews, the key considerations from this framework are:

  • Do you need absolutely true results? Yes, we do. That's the point of a literature review. We want to know what the literature actually says.
  • Can you verify the accuracy of the results yourself? This is the key issue.

Let me first talk about the risks

Generative AI tools like ChatGPT will confidently answer questions that we ask about what the literature says. But we must be very clear and not fool ourselves: generative AI is well known to make up answers out of thin air. (The technical, polite term for this is "hallucination".) So, if we submit and publish literature analysis by tools like ChatGPT without verifying it, we have an extremely high chance of committing academic fraud.

Another serious issue is that ChatGPT and similar tools copy answers from various sources, often with minimal paraphrasing, without citing its sources. A thorough study on the topic found that "59.7% of GPT-3.5 Outputs Contained Some Form of Plagiarized Content".

However, it is crucial to not make the mistake of thinking that AI tells lies or that it plagiarizes. In fact, AI cannot tell lies or plagiarize. Only humans who have an ethical sense of what is true or false and a consciousness of when we are telling the truth, exaggerating, being dishonest, sneaky, cheating, etc. can tell "lies" or "plagiarize". AI has no ethical sense whatsoever. It is just following instructions and random fluctuations to give answers to our questions with no inherent sense of whether the answer is true or false. That's how it works. So, if we submit answers from generative AI that are plagiarized or have false information, we, the academic authors who chose to use generative AI, are 100% responsible. We can't say, "ChatGPT made me do it."

So, if we want true answers, we need an "oracle", that is, an authoritative source that we can depend on to verify if the answers are true or not. That oracle must be US HUMANS, the literature review authors. No AI can do this for us. But--here's the good news--AI can most certainly help us.

How to use generative AI appropriately

Because of these well-known risks of generative AI, many tools are rapidly being developed that help us to use generative AI appropriately. I can't keep up with all of them, but two of the best ones that I recommend are Consensus and ScholarAI. The key idea is the same:

  • When we ask them questions about the literature, they only give answers for which they can find support in the literature.
  • They cite and link the literature to support their answers so that we can verify their answers ourselves.

The second point is crucial. If I didn't sufficiently scare you with my first part above, let me repeat: we should never trust unverified answers produced by generative AI. We must verify everything it tells us ourselves and take full responsibility for it. These excellent tools make it very easy to do that. They identify literature much faster and in more detail than we can easily do ourselves. As long as we verify everything these tools give us, we can find them to be extremely helpful research assistants.

Again, my linked presentation and recording give much more details on further risks, how to avoid them, and how to effectively use these tools, with a brief case study of an article I wrote with extensive assistance of generative AI.

12
  • 3
    That flowchart is poor because the concerns expressed on the right hand side are also valid on the left, but are skipped entirely if you take that route. It would be far better to express the same information without the flow chart and make clear those are all concerns.
    – Bryan Krause
    Commented Jan 16 at 15:30
  • 2
    Using GenAI for data cleaning (just one example).
    – Bryan Krause
    Commented Jan 16 at 15:42
  • 2
    @BryanKrause You're right; I'll need to rework my flowchart. Thanks. If you think of any other examples, I'd appreciate if you could add them to these comments.
    – Tripartio
    Commented Jan 16 at 15:51
  • 1
    I'm having trouble thinking of any cases where data are used and the results don't need to be accurate. Sorting resumes/applications, grading papers, any research project. Organizing or prioritizing your email (in particular as a professor in the US: FERPA).
    – Bryan Krause
    Commented Jan 16 at 15:53
  • 3
    @BryanKrause A huge use of generative AI is generating ideas such as strategies, inspiration, suggestions for how to structure articles, etc. There is no strictly true or false answer for these.
    – Tripartio
    Commented Jan 16 at 15:57
9

I feel like this has a risk of being more work than a non-AI "assisted" review. As mentioned in other answers, AI tools are prone to "hallucinating" incorrect information. In fact, the first time I'd seen someone use ChatGPT for "hey, can you find me a paper on XYZ", it returned a plausible-looking paper, complete with BibTeX-formatted citation... that didn't exist (for more examples of hallucinated references see e.g. this Retraction Watch article). So for any tool that works along similar lines (or is just a wrapper around ChatGPT) if you ask it to point you at papers, your first step now will (well, should) be looking up whether all the papers even exist. That seems to me like it'd add an extra step compared to looking for related literature on Pubmed or the like "by hand", where at least you'll know what you get exists to begin with.

8

Just as I do make intense use of Wikipedia for suggestions and pointers to things that I might not been aware-of, I'd suggest using various AI things similarly. For that matter, other peoples' opinions! :) Follow up on things yourself, and don't rely on what amounts to "hear-say". :)

5

As of now, I advise strongly against using Generative AI in most forms of research because of the risk of hallucinations. But the technology develops quickly and reasonable people may hold other opinions.

For background, I am not an academic in the traditional sense. I am a practicing lawyer (licensed only in Nevada) that works in technology law and used to be a computer programmer. I also publish occasionally in law review journals and similar academic outlets.

I have tested multiple versions of generative AI to see if they could be used in my practice. The answer I have come to each time is that generally they are far more trouble than they are worth at this point in time.

As several other answers have mentioned, current iterations of generative AI have a tendency to "hallucinate". This is a complex topic, but essentially this means that they can frequently give answers that appear correct while being wildly incorrect. I have previously posted a video about why I believe this makes current AI more dangerous than useful. That video is now slightly dated even though it was only published a few months ago and the available generative AI has improved since then, but my fundamental opinion that currently available models are more trouble than they are worth remains the same as I continue to observe and test various versions. This however is likely to change as the technology continues to improve.

Opinions can reasonably differ. I know some lawyers that think these AIs are useful sounding boards. That is a reasonable opinion, but that must be approached with great caution. It is my opinion that with the present, fairly high, risk of hallucination that verifying what the AI provided can be harder than simply doing the research from scratch. But if you choose to use generative AI for anything where getting the facts correct matters, then everything the AI outputs should be very carefully vetted and checked.

Also, I am responding to the call of the question which focused on reliability and impact on the quality of the research. There are varying opinions regarding the ethics of the way present AIs were created as well as how they can be used that I won't address here beyond acknowledging their existence. Also, nothing I say on this forum should ever be viewed as legal advice, but works created by generative AI are most likely not able to be protected by copyright. This may be a substantial problem for anyone hoping to have generative AI create a large portion of a paper for publication.

2

Short version: This is an experimental technology and you should use it like you would an experimental gun - use a light load, wear goggles, check it after every shot, and don't rely on it without a backup.

Long version: ChatGPT 4 Turbo (paid API version) uses RAG - Retrieval Augmented Generation - which significantly cuts down on hallucinations, if prompted correctly. But it only currently has a memory limit of 70,000 tokens, or approximately 50,000 words. It's much less for the web version, where amnesia begins between 5,000 and 15,000 tokens. When it processes papers, or multiple papers, or various sources that total more than that, it begins to hallucinate even if set up with firm instructions to only use real data.

There is a different LLM well-suited for processing long or multiple papers, Claude, which has a window of up to 200,000 tokens.

If you rely on AI to give you a summary it can't, either due to not having appropriate quasi-cognitive function (e.g., ChatGPT can't process math papers), or to insufficient window size, it will make up whatever it can.

If you do understand the limitations of the exact model you're using, then it's fine to use them to significant extent, as long as you mention that. Given how complex the field is, you'd probably have to be an AI researcher - this answer is 0.1% of what you'd need to know.

If you don't understand what the model is doing, how it works, how large the context window is, and what is currently in-context and what out of it - then don't trust GAI. Use it as a search tool, but always check the paper you're summarizing yourself.

Even if you see an exciting 85% accuracy rate, the 15% that GAI is creating unconstrained by RAG can turn out to be the perfect answer to your research question: exact, complete, well-explained, and completely wrong.

0

I don't advise use of AI tools for anything like you've mentioned.

4
  • 6
    Can you back this up or is this just an opinion?
    – Mayou36
    Commented Jan 16 at 15:10
  • 1
    One could assert that the built-in spelling and grammar checks in Word are "AI" in their suggestions. But one still takes them with a grain of salt.
    – Jon Custer
    Commented Jan 16 at 18:19
  • 3
    @JonCuster You could, but I wouldn't find it a very interesting discussion. Commented Jan 16 at 23:06
  • @Mayou36 I meant to return to this answer, but didn't. Now, people have provided many more detailed answers and I would just be adding similar statements. I've upvoted those I agree with. Commented Jan 16 at 23:11
0

GenAI does not understand, does not reason. It's basically a parrot on steroids, repeating and creating from learned content based on probabilities.

Thus you cannot rely on it to follow arguments, scientific discourse. And therefore it seems unsafe or unreliable to use it in literature review.

Say, for example, there is a flaw in the line of reasoning of a paper you are citing, a wrong assumption or fake data that is easily identifiable. As a human reader you should be able to spot this and your review would contain a mention of your concerns (or discard the work entirely as part of the review), an LLM might just blindly summarize the work even though it's bogus.

-4

This is a value judgment you would have to make, there is no gold standard answer.

However before you do make a decision, take into account what AI is:

Artificial intelligence (AI), the ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings. The term is frequently applied to the project of developing systems endowed with the intellectual processes characteristic of humans, such as the ability to reason, discover meaning, generalize, or learn from past experience.

In other words, it is designed to learn from your interactions with it, the more you utilize it the more it learns to be better than you at the task you are utilizing it for. Basically, you're teaching it to learn how to be better than you. So, the question of who is utilizing who ought to be answered beforehand, especially because those AI tools didn't appear ex nihilo, some entity—in most cases —with a pecuniary motive designed them.

Take this case for an example:

You've probably heard the story: A young buck comes into a new job full of confidence, and the weathered older worker has to show them the ropes—only to find out they’ll be unemployed once the new employee is up to speed. This has been happening among humans for a long time—but it may soon start happening between humans and artificial intelligence.

Put all of this together and there’s the potential that companies could use data they’ve harvested from workers—by monitoring them and having them interact with AI that can learn from them—to develop new AI programs that could actually replace them. If your boss can figure out exactly how you do your job, and an AI program is learning from the data you’re producing, then eventually your boss might be able to just have the program do the job instead.

In closing, while AI tools do offer advertised benefits one must be cognizant of the unadvertised costs too.

6
  • 2
    This aswer is not completely accurate. The majority of AI systems are not continuously learning by interacting with users. This would require a massive capacity of computing power. Even models like chat gpt can be used in safe privacy protected manner. My university wrapped it up with their own interface and all data are encrypted.
    – The Doctor
    Commented Jan 16 at 19:40
  • @TheDoctor If it really is OpenAI's ChatGPT, you've been mislead. OpenAI promises not to use data given to their API for training any more, but they used to, and regular ChatGPT does use information for this purpose.
    – wizzwizz4
    Commented Jan 17 at 3:29
  • That said, @Ptah-hotep: while “it is designed to learn from your interactions with it” is true (in a sense, depending on what we mean by “it”, “designed” and “learn”) of many such systems, that's not anything inherent or intrinsic to it being AI. It's a property of the particular computer system.
    – wizzwizz4
    Commented Jan 17 at 3:30
  • 1
    ChatGPT is not the only LLM available. I run LLAMA2 and many other models on my on instances. There is absolutely no learning involved in these modeld unless I want to fine tune it. Nonetheless, my institution host their own ChatGPT and they promise that no data is sent to OpenAI, everything is done locally.
    – The Doctor
    Commented Jan 17 at 7:04
  • 1
    @wizzwizz4 Even when they were using user data, they were not using it for continuously learning. What they meant by using user data is that they would collect your interactions, and may or may not choose to use it as part of the dataset for the next generation of models. Its a significant difference. Its generational not continuous. Commented Jan 17 at 13:28

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .