Web of Science have launched Author Finder and Scopus has of course launched Author Identifier. Both tools are supposed to assist when there are lots of variants in author listings. In this news flash Authors, Authors: Thomson Scientific and Elsevier Scopus Search Them Out by Barbara Quint at 24 July 2006 you can read more about these tools.

So let’s do some evaluations of these tools by using examples from my other Author-address search evaluations in Web of Science and Scopus. Let’s try my extreme author search example on Rantapaa-Dahlqvist, S. This is a screenshot from Web of Science when searching Rantapaa in Author Index which I usually suggest to use to get more complete author searches:

Here’s a screenshot of the first step in using Author Finder in Web of Science entering the last name and first initials Rantapaa-Dahlqvist, S.:

Step 2 displays author variants. But in this case Author Finder can’t find misspelled variants of the last name, just variants of first name initials. Both 51 records of Rantapaa-Dahlqvist, S and 1 record of Rantapaa-Dahlqvist, SSRD is retrieved:

Step 3 gives you the option to choose subject categories. Those categories are based on the subject of the journal published in, not the article itself:

Step 4 is a valuable refine option called Select Institution. Here you can choose the author affiliation, but it’s not always possible to refine on department level, even if the department exist on the record. The confusing thing here is that you get all the affiliations of all authors of all articles, not just for the author you’re searching. One benefit with this refine option (which also exist in Refining you results) is you get a list of possible synonym addresses and misspelled or incomplete addresses also. If you use this option to refine your search you must be aware that all records are not complete with author addresses and refining can make you loose important records:

Let’s try searching for Stegmayr, B. You get options for Stegmayr BG and Stegmayr BK also. If you choose all variants: Stegmayr, B, Stegmayr, BG, Stegmayr, BK you get 236 records. Trying to refine in step 3 by address is just such a mess because all the author addresses from the articles are displayed. Both Stegmayr are active at the same university department. Stegmayr, BK has written one article with address DEPT INTERNAL MED, but when refining with that department name you get 2 other records instead.
Let’s search for Haglin, L with Author Finder and refine to address. In this screenshot we choose UNIV UMEA HOSP, UMEA UNIV, CTY COUNCIL VASTERBOTTEN, UNIV HOSP:

That search returns 7 hits which means 4 older records are not returned. Compare results here:

This is one of the records that will be missing because Haglin’s affiliation at that time was in Uppsala:

When searching Astrom, S in Author Finder in Web of Science there are lots of options for refining on affiliation level. Refining to UNIV HOSP will also retrieve Astrom, Siv at the Department of ophthalmology, not just Astrom, Sture at Nursing Deparment. There is no option to choose just UMEA or if you take UMEA UNIV you miss Astrom, Siv. But most strange is why DEPT OPHTHALMOL isn’t optional, though other deparments like DEPT MICROBIOL is displayed. It stands clear this refining option on Institution name is just a mess!!

Searching for author Bernspang B in Cited Ref search returns a list with one entry Bernsprang B also:

The entry Bernsprang B is a misspelled citation for her thesis:

When searching Author Finder you will not retrieve the misspelled Bernsprang when searching Bernspang:

Searching Lundin Olsson or Lundin-Olsson returns no hits either in General Search or Cited ref search. Searching LundinOlsson in Author Index retrieves both LundinOlsson ,L and LundinOlsson, I:

Checking the variant with I as first initial indicates it’s a incorrect citation:

Checking Scopus shows the record is not incorrect in their reference list:

So let’s check the Author Indentifier in Scopus. It’s integrated in Author Search. A search for Rantapaa-Dahlqvist returns three variants:

But truncated Rantapa* S* returns boths variants of Rantapaa-Dahlqvist and Rantapaa-Dahlquist:

But you also need to search Dahlqvist* s*:

An advantage in Scopus compared to Web of Science is the possibility to search fullength first name. But that doesn’t even help always. Searching Eriksson, Sture returns following screenshot:

Choosing first hit gives 66 records and three records are associated via Eriksson, S-E. It’s not the same person as Eriksson, Sture at Umea university. Eriksson, S-E affiliation 2001 was Falun hospital:

Hit 9 doesn’t show affiliation but when checking the hit the record has affiliation connected in the record:

Hit 2 when checking has also affiliation at the record and shows up being Eriksson, Sture at Umea university and hit 3 and 10 are Eriksson, Sture at Umea university but none of them are connected to the first hit of Eriksson, Sture with 66 records. That’s not an improvement, that’s confusing.

The good thing is though that Eriksson, Staffan (same first initial as Sture) at the same department as Eriksson, Sture is not connected here in the search for Eriksson, Sture, which is a problem in Web of Science, as they are not using fulllength first name.

Conclusion: Author finder in Web of Science could be hazardous for detecting possible misspellings of author names. The refining option of Institution Name in Author finder presents a nice overview but is really hazardous because the addresses don’t refer just to the author, instead all co-authors also! All address information is not always possible to include and excluding options are not available. I suggest using Author Index instead.

Scopus Author Identifier has some improvements, though there still are serious flaws existing. Misspellings are still a problem and not solved in this new algorithm.

Bauer et al have published two articles on citation search:Bakkalbasi N, Bauer K, Glover J, Wang L (2006)
Three options for citation tracking: Google Scholar, Scopus and Web of Science Biomedical Digital Libraries, Vol. 3, No. 7, 29 June.

Bauer K and Bakkalbasi N (2005) An Examination of Citation Counts in a New Scholarly Communication Environment
D-Lib Magazine, Vol. 11, No. 9.

Noruzi also made some brief evaluations in an article:

Noruzi, Alireza Google Scholar: The New Generation of Citation Indexes
LIBRI Vol. 55, Iss. 4, p. 170-80
Belew K compared citation search in WoS and Google Scholar:

Belew, RK (2005) Scientific impact quantity and quality: [PDF] Analysis of two sources of bibliographic data.

The first article of Bauer and Bakkalbasi showed the citation count for GS was higher than WoS and Scopus for 2000. But for 1985 WoS seem to be best to cover citations. Comparing WoS and Scopus, WoS found more citations for 1985 but for 2000 it was similar.

The next article of Bauer, Bakkalbasi et al evaluated journal articles from two disciplines: oncology and condensed matter physics (CM physics) and two years: 1993 and 2003.

Their conclusion is:”This study did not identify any one of the three tools studied to be the answer to all citation tracking needs”. Scopus shows strength for oncology articles from 2003, but WoS performed better for CM physics and was better for both disciplines published in 1993. GS returned smaller number but had a large set of unique citing material for 2003. Bauer, Bakkalbasi et al make clear:”…it is clear that Google Scholar provides unique citing material.”

The article by Belew compares GS with WoS by author search. Belew randomly selected six academics from same interdisciplinary department and bibliographies of all publications by these authors were manually reconciled against 203 references found by one or both systems. WoS discovered 4741 citations and GS 4045, but when evaluating each author 2 authors get significantly more citations in GS.

Belew indicates that because of the quality in some bibliographic citations it’s common to find that same publication has been treated as more than one record. When searching cited ref search in WoS for an author you can find these types of errors. As Belew indicates in Table 1. With these types of errors it’s possible to loose citations for an article in WoS, but instead there are sometimes duplicates of an article (preprint and original article) that inflates citation count. Belew does not discuss that GS often shows duplicates and sometimes if you manually check the number of times cited it’s incorrect displayed.

Belew conclusion is: “GS seems competitive in terms of coverage for materials published in the last twenty years; before then WoS seems to dominate”.

We also earlier this year made a small test between Scopus and WoS by searching author name, but just author names that we can sort out as unique.

Noruzi made free text searches when testing citation search with search statement: webometrics OR webometric. Freetext search is not a proper subject search. As Bauer et al is pointing out WoS, GS and Scopus databases processes a freetext search in different ways. For example Google Scholar indexes even the fulltext of articles in contrary to WoS and Scopus. Though in this case Noruzi still have just compared each known article, though the method of choosing articles and the low amount of articles may be arguable.

None of Belew and Bauer et al have discussed the problems with citation counting in Google Scholar. Though Peter Jacso have criticized Bauer et al and presented examples of flaws in Google Scholar:

Jacso, Peter ([2005b]) Google Scholar and The Scientist
(Published on university homesite as extra material). [online]
I believe the percentage of flaws in Google Scholar may not decrease the value of the findings significantly of Belew and Bauer et al but it should be considered and discussed. Research on the propotions of citation counting flaws in Google Scholar would be of considerable value for future evaluations.

I checked the citation counting in Google Scholar of the first article Noruzi refers to in his test in Table 2:

C Almind and P. Ingwersen Informetric analyses on the world wide web Journal of Documentation Vol. 54 Iss. 4, p. 404-426

Check tihis screenshot:

I received 4 hits where the first hit clusters 11 duplicates (look at link group of 11). 3 duplicates (hit 2-4) are unclustered. Together it’s 192 citations for the article of Almind et al. But if you check the reliance of citations in all hits you will find duplicates. I evaluated maually all 192 citations together and found 13 obvious duplicates. It’s manually checked and some more duplicates may be found. All records in chinese letters are not checked. Here are screenshots on all duplicates put together with an image editing software:

Of 192 citations from GS, 12 is duplicates which gives these results: GS 180, WoS 90. This means 6% is incorrect citations.

Of course WoS could have duplicates also.

Conclusion: Scopus is important for finding more citations from 1996 and current. Google Scholar is important because it finds a lot of unique citations but each reference with information on times cited should be manually checked by counting and looking for duplicates. Web of Science is still competetive, especially for older material.

As Noruzi mention in his article GS indexes a lot more of publication types and from various languages. If every citation, no matter from which source, has the same value of 100%, GS is an important source. The discussion should exceed on the value of each citation. Should self-citation get any value at all? Should articles not peer-reviewed get a lower value for their citations?

I've made a page with a list of conference presentations reviewing Scopus and Google Scholar and also Web of Sceince in comparison with the former databases. Three examples follow below:

Jacso, Peter "The Endangered Database Species: Are the traditional commercial indexing/abstracting & full-text databases dead?"[PPT]
UK Serials Group 29th UKSG Annual Conference and Exhibition, University of Warwick 3-5 April 2006.

Jenkins, JR "Article Linker Integration with Google Scholar (or Google Scholar as referring source)"[PPT]
OpenURL and Metasearch: New Standards, Current Innovations, and Future Directions, September 19, 20, 21, 2005, Washington, DC.

Tarantino, Ezio "Scopus, WOK, Google Scholar: too much or not too much?" [PPT]
The International Coalition of Library Consortia Autumn 2005 7th European Meeting, Poznan, Poland 28.09 – 01.10, 2005.

If you have more suggestions, just post a comment. Please note, however, that it should be a conference presentation (i.e., not a lecture) and it should focus on some of the above-mentioned sources.

We made some citation frequency comparisons between Scopus, Web of Science and Google Scholar. As Scopus counts citations from 1996 we limited the comparisions to articles published from 1996 and current. The result of the figures in the screenshot showed:

Scopus finds 9% more citations than Web of Science when limited to articles from 1996-.

Scopus finds 20% more citations than Google Scholar when limited to articles from 1996-.

Web of Science finds 10% more citations than Google Scholar when limited to articles from 1996-.

Important to know is that Web of Science indexes more than 9,000 journals compared to Scopus 15,000 journals, though Web of Science argues that (according to Bradford's Law) they have the core journals which have the most citations. Google Scholar has no list of journals and other sources they index, but they index both articles from the proprietary web and scholarly archives, master theses, books etc. Google Scholar citation counting is not working properly either as we already pointed out in a previous posting. In this test all cited references from Scopus haven't been retrieved, just the indexed articles.

As we also already mentioned, the article "An Examination of Citation Counts in a New Scholarly Communication Environment" published in D-Lib magazine September 2005 Vol. 11, No. 9. by Kathleen Bauer et al at Yale University Library made some citation counting. But when we just counted all citations for a random 5 set of authors at Umeå university, Bauer et al made comparisons of the average number of times an article is cited. Both our test and the test by Bauer et al didn't check the Google Scholar inconsistencies of citations counting and duplicates.

Some of the findings from the article by Bauer et al follow below. The information derives from the tables in their article.

The search for articles published 2000 in Journal of the American Society for Information Science and Technology (JASIST) showed for example:

Web of Science counts 0.3 more citations than Scopus.

The search for articles published 1985 in Journal of the American Society for Information Science and Technology (JASIST) showed for example:

Web of Science counts 11.9 more citations than Scopus.

Because Scopus just count citations from articles published from 1996 and current the 11.9 difference is not surprising. Though the 0.3 difference for articles published from 2000 is more questionable. This test by Bauer et al has its limitations because it's limited to just one journal (i.e., JASIST).

Conclusion: Different testing methods at least shows that Scopus definitely is important when searching citations for articles published from 1996. Due to inconsistencies in Google Scholar its not suggested as a single usable tool for citation search.

It's hard to make an easy and still deeply and thorough evaluation of subject coverage in Scopus, Web of Science and Google Scholar due to a lot of reasons. Especially because the databases in question do not use established thesauri. Though I made a small comparison between these multidisciplinary databases and PubMed.

I chose three MeSH terms (two of them with subheadings included) with three words included. I limited my search to 1996, mainly because Scopus subject coverage before 1996 is selective. The MeSH-terms were:

Hormone replacement therapy
Antifreeze proteins toxicity
Neonatal screening ethics

Result from PubMed searching MeSH database:

Result from Scopus searching field keywords:

Result if broading the search to title, abstracts and keywords.

Result from Web of Science when searching Topic in General search which include title, abstracts and Keywords (author keywords and keywords plus):

As for results in Google Scholar they are more hard to evaluate, because Google Scholar indexes significant parts of the fulltext. It's possible to limit to title search but not abstracts for example. A lot of the material Google Scholar indexes is retrieved from the open web and other material is Journal articles references (and fulltext) from publishers. Google Scholar has not integrated any thesauri for the article references, however. Instead they have 7 subject areas available for limiting in advanced search. As viewed in this screenshot one of the 7 subject areas is Medicine, Pharmacology and Veterinary Medicine. I made a limit to that subject area and timespan 1996-.

2310 hits are definitely more that the others but as you see the second hit is definitely of high relevance but the others have indexed the word ethics in the fulltext where the word ethics is part of a ethics committee and not necessarily relevant.

Screen shot of search on antifreeze proteins toxicity shows 60 hits:

Not all of these hits are relevant and some are hits from books.

Screen shot of search on hormone replacement therapy shows 26.200 hits:

Conclusion: It's not recommendable to use Web of Science, Scopus or Google Scholar when doing exhausitve, specific searches when all possible important records of current science have to be found. This is due to the fact that thesauri and controlled vocabulary are not integrated at all or not properly.

Broadening a subject search in Scopus from searching Keywords to searching Title, abstracts and keywords gives a higher recall but not in all cases relevant records. To broaden a search both Scopus and Google Scholar is recommended but not Web of Science which indexes less material from 1996.

From Elsevier databases Scopus has integrated thesauri like GEOBASE Subject Index (geology, geography, earth and environmental science), EMTREE (life sciences, health), MeSH (life sciences, health), FLX terms and WTA terms (fluid sciences, textile sciences), Regional Index (geology, geography, earth and environmental science), Species Index (biology, life sciences), Ei thesaurus (controlled and uncontrolled terms) (engineering, technology, physical sciences). As you see, the last one includes uncontrolled terms. Scopus also integrate author keywords which are uncontrolled keywords supplied by the author of the article.

When searching the field Keywords in Scopus you won’t get just controlled vocabularies, you will also get uncontrolled vocabulary from Ei and author keywords.

This reference from PubMed:
Nicolau B, Marcenes W, Bartley M, Sheiham A.
Associations between socio-economic circumstances at two stages of life and
adolescents' oral health status.
J Public Health Dent. 2005 Winter;65(1):14-20.

It does not have the MeSH terms (from PubMed) or EMTREE (from Embase) integrated in the Scopus reference. Just author keywords as you see at this screen shot:

Another example of a Scopus record with no EMTREE terms. (No MeSH headings exist yet because it's a PubMed in process record).

And the same reference in Embase with EMTREE terms:

The following reference is from from PubMed:

Anderson C.
Breast cancer. How not to publicize a misconduct finding.
Science. 1994 Mar 25;263(5154):1679.

See the MeSH-terms at the screen shot. The terms with * -sign means it's a major MeSH heading:

Not all major MeSH headings are included in Scopus reference:

Ei thesaursus, sometimes called Compendex thesaurus, is not properly implemented either. On this screen shot you find a record from Scopus with no Ei thesaurs terms attached:

And here's a screen shot from the database Compendex showing the same record with Ei thesaurus terms attached:

When testing and comparing Compendex with Scopus, quite a lot records didn't integrate Ei thesaurus, but when it exists on Scopus records it has integrated both main heading, controlled and uncontrolled terms properly.

Unfortunately, subject search in Scopus has a lot of disadvantages:

  1. It’s impossible to browse the keywords and thesauri integrated in Scopus.
  2. The thesauri are inconsequently integrated, sometimes no MeSH terms, sometimes no Emtree, sometimes not all major MeSH headings.
  3. Uncontrolled terms are mixed with controlled ones and not possible to separate when refining a search.
  4. You can’t choose which thesauri to use.
  5. No mapping of terms as in Embase and PubMed.
  6. No possibility to explode terms.
  7. No integration of MeSH subheadings
    Conclusion: This means Scopus is impossible to use for refined and comprehensive subject search. That means you have to use PubMed to properly use MeSH terms, Embase to properly use Emtree and Compendex to properly use Ei terms. Of course Scopus is not built to substitute the Elsevier databases. That’s why I don’t think Scopus will ever implement the thesauri of the Elsevier databases properly. But why subject search of MeSH terms is not properly implemented when PubMed is a free source is very strange.

In Scopus you have a separate bar for Author search. You could also choose to limit to Author in Basic search or search operators in Advanced search. When searching rantapaa s or Rantapää s with Swedish/Finnish diacritic characters ä, you receive 23 hits including the article indexed with the full first name Solbritt.

But if you choose to search with Rantapaa-Dahlqvist S or Rantapaa Dahlqvist you loose the hit with first name Solbritt because that record is spelled Dahlquist.

But searching Rantapaa Dahlquist S doesn’t find all dalhquist-spelled records. Neither does a search for Rantapaa Dahlquist SB or with dots S.B. or with space S B.

When you instead search Dahlquist s, you will find the dahlquist-spelled records.

So when I click on the 20 hits from my initial search Rantapaa Dahlqvist S I get 69 (!) records instead of 20. How come?
Even stranger: When clicking the author name in one of the records, 71 records are retrieved. But the number of records retrieved in the initial search was 72. Could it be more confusing? View this video [AVI]

Still none of the 72 records mentioned includes the article by Rantapää Dahlqvist S she published in Lancet 1998. Searching that article by title shows that her name is implemented as Dahlqvist SR. Clicking on her name under author(s) gives 28 hits. 4 of them give hits on other spelling variants of her name but not all 72 articles from the initial search.

If you check this article “New concept in echocardiography” in PubMed you will see it’s implemented with Rantapaa Dahlqvist S. Checking the article in the electronic source of Lancet via Science Direct (Yes, it’s owned by Elsevier who owns Scopus!) shows the name implemented as Rantapaa Dahlqvist, Solbritt.

So how to find all articles by Rantapaa Dahlqvist S and author variants? The initial search Rantapaa S gives 72 records. Searching Dahlquist S gives 2 records. Searching Dahlqvist SR gives 24 records. Together 98.
Conclusion: Though this author search is an extreme example, it shows the problem of the implementations of author names. You have to search all possible misspellings of a name. Preferably, you should have a proper publication list from the author. And at last WHY is the reference from Lancet discussed above consistent in one Elsevier product (Science Direct) and not in the other (Scopus)?

Next Page »