Google Scholar

It's not hard to find inconsistencies and flaws in Google Scholar. Some of them follow below.
This search on semiconductors is an example from Peter Jacso Google Scholar and The Scientist 2005 (Published on university homesite as extra material).

In this reference it seems like the article is published 2006, but checking the source shows it's published 1990 and 2006 is the starting page of the article:

Searching advanced search limiting the date range to 1995-2006 returns 135,000 hits. But extending the date range to 1985-2006 returns just 131,000 hits. How come, Google Scholar?

Another flaw in Google Scholar is the OR Boolean operator. In this case the result for: dahlqvist OR dahlquist is 16.200 which means there should be 500 documents with both dahlqvist AND dahlquist, otherwise 16.700. But it's not.

This is a quotation from Peter Jasco "As we may search" in Current Science Vol. 89, No. 9. (10 November), pp. 1537-1547.:

"G-S is a free service, and for many who consider it to be a gift for the world it may be anathema to say any but good words of it. It is also to be emphasized that it is a joint gift by some publishers and/or their digital facilitators (the content part), and Google (the software and the service operation part). If ISI or Elsevier could have received such unfettered access to the publishers’ archives for harvesting their sites offering standard-compliant metadata, they could probably sell their services – if not for free – at a fraction of their current price. Building a multi-million record database incurs multi-million dollar investment just to subscribe to the journals, administer their processing, and record their standard bibliographic data, abstract, and descriptors, for about 1 million papers per year in the most recent period".


Of course, Google Scholar also has problems similar to Web of Science and Scopus when indexing author Rantapaa Dahlqvist S. When searching via advanced scholar search in author field, the following advanced search operators are returned: author:Rantapaa Dahlqvist author:s and 41 hits.

One of the 41 hits have both first name initials SB. The result also includes 7 hits on author variant Dahlqvist SR. A search on Dahlqvist SR shows on the contrary 28 hits on that variant. And here we find the article that was published in Lancet and was misspelled in Web of Science as Dahlwvist SR.

In contrast, a search on Rantapaa-Dahlqvist S without ää returns just 35 hits(!).

Trying to search misspelling Rantapaa-Dahlquist S returns no hits while same misspelling with ää Rantapää-Dahlquist S returns 2 hits which are not included in the Rantapaa Dahlqvist S search of 41 hits.

But searching Rantapää S returns 43 hits included misspelled rantapää-dahlquist.

But this is not all. Searching Rantapaa SB gives one more hit not included in the 43 hits or 41 hits mentioned before. Such a mess!

Let’s try some other authors. Searching author:sojka author:p also returns hits of P Jakubus Z Sojka. This means that all first name initials with P and all surnames called Sojka are searched with this syntax.

To get refined matching, use quotation marks like this: author:”p sojka”. But still the problem exists that you have both pa and pe sojka. To exclude Paul E Sojka you could write: author:”p sojka” –author:”pe sojka”, but still you can’t be sure that all p Sojka records include just PA Sojka. And to find records by PA Sojka you can’t add it like an OR-statement. Didn’t work for me. You need to do a separate search: author:”PA Sojka”

And it’s not possible to a make limited search to address. But, as you may already have learnt from previous postings, author address search in Web of Science and Scopus is too inconsistent to suggest as a valuable method for refining your author search.
It is also worth noting that if you don’t restrict an author search to the author field in advanced search or with advanced search operators you will get hits where the author name exists in the fulltext of the articles which Google Scholar indexes parts of.

Conclusion: The same problems with author search as in Web of Science and Scopus exists in Google Scholar. And as I said in a previous posting, a proper publication list from the author is the best way to be sure to find every important article.

It's hard to make an easy and still deeply and thorough evaluation of subject coverage in Scopus, Web of Science and Google Scholar due to a lot of reasons. Especially because the databases in question do not use established thesauri. Though I made a small comparison between these multidisciplinary databases and PubMed.

I chose three MeSH terms (two of them with subheadings included) with three words included. I limited my search to 1996, mainly because Scopus subject coverage before 1996 is selective. The MeSH-terms were:

Hormone replacement therapy
Antifreeze proteins toxicity
Neonatal screening ethics

Result from PubMed searching MeSH database:

Result from Scopus searching field keywords:

Result if broading the search to title, abstracts and keywords.

Result from Web of Science when searching Topic in General search which include title, abstracts and Keywords (author keywords and keywords plus):

As for results in Google Scholar they are more hard to evaluate, because Google Scholar indexes significant parts of the fulltext. It's possible to limit to title search but not abstracts for example. A lot of the material Google Scholar indexes is retrieved from the open web and other material is Journal articles references (and fulltext) from publishers. Google Scholar has not integrated any thesauri for the article references, however. Instead they have 7 subject areas available for limiting in advanced search. As viewed in this screenshot one of the 7 subject areas is Medicine, Pharmacology and Veterinary Medicine. I made a limit to that subject area and timespan 1996-.

2310 hits are definitely more that the others but as you see the second hit is definitely of high relevance but the others have indexed the word ethics in the fulltext where the word ethics is part of a ethics committee and not necessarily relevant.

Screen shot of search on antifreeze proteins toxicity shows 60 hits:

Not all of these hits are relevant and some are hits from books.

Screen shot of search on hormone replacement therapy shows 26.200 hits:

Conclusion: It's not recommendable to use Web of Science, Scopus or Google Scholar when doing exhausitve, specific searches when all possible important records of current science have to be found. This is due to the fact that thesauri and controlled vocabulary are not integrated at all or not properly.

Broadening a subject search in Scopus from searching Keywords to searching Title, abstracts and keywords gives a higher recall but not in all cases relevant records. To broaden a search both Scopus and Google Scholar is recommended but not Web of Science which indexes less material from 1996.

Union catalogs from Hungary, Iceland, Israel, Portugal, Sweden, and Switzerland are now visible with links to bibliographic information and items via the Google Scholar interface. Try searching for information science and find links for some of the hits to Library Search (Sweden), if you’re coming from Swedish domain.

If, for example, you’re coming from the Swedish domain but want to check items from other union catalogs, just go to Scholar Preferences and search for the country to configure the union catalog you want. Then save your preferences.

Make the same search on information science and you will get the link Find in RERO if you have chosen Schweiz as in my example.

Read the announcement Global searches go to local libraries at Google official blog 2/20/2006.

We made a freetext search on headache in Pubmed with limits to entrez date 60 days which returned 442 records. We checked the availability of the PubMed records (in descending order) in Google Scholar until we reached the possible breakpoint and then checked the EDAT which is the date and time when the record was added to PubMed.

Google Scholar has indexed 1 of 5 articles published 2005/12/29 09:00 GMT-8. The screenshot below shows both the Google Scholar reference and a clipping from part of the PubMed reference:

From 2005/12/28 09:00 and descending one following day everything was indexed, with the exception of one article.

PMID 16375021 from 2005/12/27 doesn’t exist in Google Scholar, but the rest of the articles from the same date exist.

From 2005/12/31 09:00 and ascending at least one following date nothing was indexed by Google Scholar (2005/12/30 had no records in this search).

This means thet Google Scholar, at least in this single test 2006-02-01 11:00 GMT+1, has an update gap of more than one month. I did a similar undocumented test in 2005-10-07 which showed a latest update in Google Scholar 2005-08-17. To discover if the updates of PubMed via Google Scholar is regular or unregular requires regular tests during a longer time period.

Google Scholar went international on Jan 11th when they included the Scandinavian languages Finnish, Swedish, Danish and Norweigan. Gary Price wrote 11 Jan in Search Engine Watch Blog that there are just two languages included but when I check all four languages are covered. Check the following screenshots.

Searching kirjasto on

Searching semantiska webben on

Searching uddannelse on

Searching sykepleiere on

When you check the screenshot for Swedish Google Scholar (, you can see the first hit is an article about the semantic web (swe. semantiska webben). I wrote the article for a computer magazine called Datormagazin. It’s not scientific in any aspect and of course not peer-reviewed but it has been cited for example by the master thesis (swe. magisterexamensuppsats) Ontologier i kunskapsorganisation by Irene Granström. Swedish master theses are not considered to be scientific. This is an example of the broad aspect of indexing that for example Peter Jacso criticized in his evaluations of Google Scholar.

The relevance order of the assessed and non-assessed research, low graduate papers , preprint articles and popular science articles etc. is done by the ranking algorithm, but the width of Google Scholar compared to Scopus and Web of Science could be useful if the user’s have the skills to do content assessments.

I also sent some questions to Anurag Acharya:

Lars: Could you give an example on Swedish publishers you work with? (In this case I wanted to know if Google Scolar does cooperate with Swedish publishers publishing works in Swedish.)

Anurag: We are not sharing a list of publishers at this time.

Lars: Did you have any Swedishfluent people you worked with to implement Swedish GS?

Anurag: No. Note however that scholarly articles are remarkably similar in
structure across many languages and most of the issues are common.

Lars: How do you restrict a search to Swedish, Finnish, Danish or Norwegian documents?

Anurag: This is not possible at this time. We may add this in the future.

Lars: Now when Swedish characters å, ä, ö, is implemented will you connect different spellings in author names to same search? Like söderström gives hits also on soederstroem also? It’s not done with rantapää.

Anurag: We have implemented several cases of diacritical normalization. Would appreciate suggestions for others that we may have missed.

Some further questions have not been answered. I will publish them here if I get them answered.

A suprised researcher at my university told my colleague some days ago when searching her name in Google Scholar: I didn’t write that article! Her name is Berit Ardlin and her christian name initials BI. Look at this screenshot from Google Scholar.

When you click the link you get the article, but with other authors. Check the reference in Pubmed for example. So how come? Google Scholar indexes the fulltext of articles (some from the proprietary web, some from the open web) up to a certain limit of KB. The fulltext is often visible in the search results because your search keywords exists in the fulltext. In this case BI Ardlin and the other authors M Braem, B Van Meerbeek, JE Dahl etc just should exist in the fulltext. But checking the fulltext of the article doesn’t give any hits on Ardlin. So where do these author names come from? From other fulltext? Someone with a clue or is it just Anurag Acharya at Google Scholar who has got the answer?

At least the dots before the “false” author names and the dots after (in front of the journal name) indicate it’s from some other text.
This is just an example on how the search results visualization, due to full-text indexing ambitions, sometimes makes it very confusing in Google Scholar.

« Previous PageNext Page »