Subject coverage

This quite recently published article examines subject coverage in Google Scholar:

Neuhaus, Chris, (2006) Ellen Neuhaus, Alan Asher and Clint Wrede The Depth and Breadth of Google Scholar: An Empirical Study
Portal: Libraries and the Academy Vol. 6, No. 2, pp. 127-141.

From 47 databases 50 article titles were randomly collected from each database and compared with Google Scholar. Look at the figures here:

Conclusions: Each databse of 47 (with 2350 randomly selected articles) had a median and average coverage of 60%.

Neuhaus et al means GS weaknesses in subject coverage are social science and humanities databases and strenghts science and medical databases, open access databases, and single publisher databases.

It's hard to make an easy and still deeply and thorough evaluation of subject coverage in Scopus, Web of Science and Google Scholar due to a lot of reasons. Especially because the databases in question do not use established thesauri. Though I made a small comparison between these multidisciplinary databases and PubMed.

I chose three MeSH terms (two of them with subheadings included) with three words included. I limited my search to 1996, mainly because Scopus subject coverage before 1996 is selective. The MeSH-terms were:

Hormone replacement therapy
Antifreeze proteins toxicity
Neonatal screening ethics

Result from PubMed searching MeSH database:

Result from Scopus searching field keywords:

Result if broading the search to title, abstracts and keywords.

Result from Web of Science when searching Topic in General search which include title, abstracts and Keywords (author keywords and keywords plus):

As for results in Google Scholar they are more hard to evaluate, because Google Scholar indexes significant parts of the fulltext. It's possible to limit to title search but not abstracts for example. A lot of the material Google Scholar indexes is retrieved from the open web and other material is Journal articles references (and fulltext) from publishers. Google Scholar has not integrated any thesauri for the article references, however. Instead they have 7 subject areas available for limiting in advanced search. As viewed in this screenshot one of the 7 subject areas is Medicine, Pharmacology and Veterinary Medicine. I made a limit to that subject area and timespan 1996-.

2310 hits are definitely more that the others but as you see the second hit is definitely of high relevance but the others have indexed the word ethics in the fulltext where the word ethics is part of a ethics committee and not necessarily relevant.

Screen shot of search on antifreeze proteins toxicity shows 60 hits:

Not all of these hits are relevant and some are hits from books.

Screen shot of search on hormone replacement therapy shows 26.200 hits:

Conclusion: It's not recommendable to use Web of Science, Scopus or Google Scholar when doing exhausitve, specific searches when all possible important records of current science have to be found. This is due to the fact that thesauri and controlled vocabulary are not integrated at all or not properly.

Broadening a subject search in Scopus from searching Keywords to searching Title, abstracts and keywords gives a higher recall but not in all cases relevant records. To broaden a search both Scopus and Google Scholar is recommended but not Web of Science which indexes less material from 1996.

In the Scopus interface in default Basic search there are options for choosing 11 subject areas as seen on this screenshot:

Subject classification is based on the journal title and is not always unique. For instance, ‘Nature’ is classified in several areas.

We made a test on 14 March by searching with quotation marks “ “ in default field title-abstracts-keywords and limited to subject area. Total records was 27,532,954.

And here you also have a graphical overview: