6

Pick a random paper using Google Search, for instance

enter image description here

Click on the Cited By link, and you will see:

  1. a list of writings, ranging from thesis, conference paper, arXiv paper, etc. etc....

  2. published on a wide range of platforms such as academia.edu, arXiv, semanticscholar.org, ieee, nowpublishers...

  3. using a variety of citation styles

It seems to me that if this was an automated process, then Google would have to keep track of every single new paper which has been published, and find the list of citation section in each paper, find a particular paper which has been cited, update the citation page for that particular paper and repeat for all citations for that paper.

But then it would have to gain access to those papers in the first place and some of them have subscriptions such as IEEE ones. It would have to ignore citation styles but keep track of correct version of whichever paper it has been cited (preprint, etc.).

Is this really how Google keeps track of citations in the Cited By link? Can someone who has insider knowledge into publish enlighten me as to how Google Search seems to be able to know citations between papers?

4
  • 1
    Yes, it is of course an automated process. I do not have firsthand knowledge, but I've been told that publishers give Google access to bibliographic data (and it's hard to see how Scholar could have this information otherwise). Most publishers make citations available on their site in addition to the bibliography in the PDF, so Google might mine it from the publisher website or the PDF. Commented Feb 26, 2017 at 5:18
  • 2
    Given the amount of money Google has, I guess that an IEEE subscription is the least of their problems. Commented Feb 26, 2017 at 11:18
  • 2
    it's hard to see how Scholar could have this information otherwise — They could parse it from other papers' bibliographies, just like humans do.
    – JeffE
    Commented Feb 26, 2017 at 16:55
  • @FedericoPoloni Yes the point is, is there a much easier way that this process is done. For example, IEEE uploads the papers directly into a Google owned database.
    – Fraïssé
    Commented Feb 26, 2017 at 19:02

1 Answer 1

10

It seems to me that if this was an automated process, then Google would have to keep track of every single new paper which has been published

Yes, this is exactly how Google does it. They crawl the web anyway, and if they find something that looks like an academic paper, they add it to their special Google Scholar index. Extracting citations from PDFs is technically not easy, but not something that is a big barrier if you have the manpower and years of experience in information retrieval like Google does.

As for how they get access to IEEE etc. - this is to the best of my knowledge not disclosed. Maybe they are just paying for institutional access like everybody else, maybe they get free access from the content providers so that they can build up their index.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .