The San Francisco Declaration on Research Assessment (DORA), see our most recent blogpost, focuses on the Journal Impact Factor, published in the Web of Science by Thomson Reuters. It is a strong plea to base research assessments of individual researchers, research groups and submitted grant proposals not on journal metrics but on article-based metrics combined with peer review. DORA cites a few scientometric studies to bolster this argument. So what is the evidence we have about the JIF?
In the 1990s, the Norwegian researcher Per Seglen, based at our sister institute the Institute for Studies in Higher Education and Research (NIFU) in Oslo and a number of CWTS researchers (in particular Henk Moed and Thed van Leeuwen) developed a systematic critique of the JIF, its validity as well as the way it is calculated (Moed & Van Leeuwen, 1996; Moed & Leeuwen, 1995; Seglen, 1997). This line of research has since blossomed in a variety of disciplinary contexts, and has identified three main reasons not to use the JIF in research assessments of individuals and research groups.
First, although the values of JIF of a particular journal depend on the aggregated citation rates of the individual articles, the JIF cannot be used as a stand-in for the latter in research assessments. This is because a small number of articles are cited very heavily, while a large number of articles are only cited once in a while, and some are not cited at all. This skweded distribution is a general phenomenon in citation patterns and it holds for all journals. Therefore, if a researcher has published an article in a high impact journal, this does not mean that her particular piece of research will also have a high impact.
Second, fields differ strongly in their usual JIF values. A field with a rapid turn-over of research publications and long reference lists (such as fields in biomedical research) will tend to have much higher JIF values for its journals than a field with short refence lists in which older publications remain relevant much longer (such as fields in mathematics). Moreover, smaller fields will usually have smaller number of journals, resulting in less possibilities to publish in high-impact journals. As a result, it does not make sense to compare JIF across fields. Although virtually everybody knows this, an implicit comparison is often still prevalent. This is for example the case when publications are compared on their JIF values in multi-disciplinary settings (such as in grant proposals reviews).
Third, the way in which the JIF is calculated in the Web of Science has a number of technical characteristics due to which the JIF can be gamed relatively easily by journal editors. The JIF is a division of total number of citations to the journal in the last two years by the number of “citeable publications”. Some publications do not count as “citeable” although they do contribute to the total number of citations if cited. By increasing the relative share of these publications in the journal, the editor can try to artifically increase his JIF value. This can also be accomplished by increasing the number of publications that are more frequently cited, such as review articles, long articles, or clinical trials. Last, the editor can try to convince or pressure submitting authors to cite more publications in the journal itself. All three forms of manipulations are occuring, although we do not really know how frequently this happens. Sometimes, the manipulation is plainly visible. Editors have been writing editorials about their citation impact, citing all publications in the past two years in their own journal, admonishing authors to increase their JIF!
A more generic problem with using the JIF in research assessment is that not all fields have meaningful JIF values, since they are only based on those journals in the Web of Science that have their JIF calculated. Scholarly fields focusing on books or technical designs are disadvantaged in evaluations in which the JIF is important.
In response to these problems, five main journal impact indicators have been developed as an improvement upon, or alternative to, the JIF. First, the CWTS Journal to Field Impact Score (JFIS) indicator improves upon the JIF because it does away with the difference in the numerator and denominator regarding “citeable items” and because it takes field differences in citation density into account. Second, the SCImago Journal Rank (SJR) indicator follows the same logic as Google’s PageRank algorithm: citations from highly cited journals have more influence than citations from lowly cited ones. SCImago, based in Madrid, calculates the SJR not on the basis of the basis of the Web of Science but on the basis of the Scopus citation database (published by Elsevier). A similar logic is applied in two other journal impact factors from the Eigenfactor.org research project, based at the biology department of the University of Washington (Seattle): the Eigenfactor and the Article Influence Score (AIS). These are often calculated on the basis of the Web of Science and use a ‘citation window’ of five years (citations to an article in the previous five years count), whereas this is two years in JIF and three years in SJR.
The fifth journal impact indicator is computed on the basis of Scopus by CWTS: the Source Normalized Impact per Paper indicator (SNIP) (invented by Henk Moed and further developed by Nees Jan van Eck, Thed van Leeuwen, Martijn Visser and Ludo Waltman (Waltman, Eck, Leeuwen, & Visser, 2012)). This indicator also weights citations but not on the basis of the number of citations to the citing journal, but on the basis of the number of references in the citing article. Basically, the citing paper is seen as giving out one vote which is distributed over all cited papers. As a result, a citation from a paper with 10 references adds 1/10th to the citation frequency, whereas a citation from a paper with 100 references adds only 1/100th. The effect is that the SNIP indicator cancels out differences across fields in citation density (though certainly not all relevant differences between disciplines, such as the amount of work that is needed to publish an article). The Eigenfactor also uses this principle in its implementation of the PageRank algorithm.
The improved journal impact indicators do solve a number of problems that have emerged in the use of the JIF. Nevertheless, careless use of the journal impact indicators in research assessments is not justified. All journal impact indicators are in the end based on the number of citations to the individual articles in the journal. The correlation is however too weak to legitimize the application of some journal indicator instead of the assessment of the articles themselves if one wishes to evaluate those articles. Whenever the journal indicators take the differences between fields into account, the number of citations to sets of articles produced by research groups as a whole tend to show a somewhat stronger correlation with the journal indicators. Still, the statistical correlation remains very modest. Research groups tend to publish across a whole range of journals with both high and lower impact factors. It will therefore usually be much more accurate to analyze the influence of these bodies of work rather than fall back on the journal indicators.
To sum up, the bibliometric evidence confirms the main thrust of DORA: it is not sensible to use the JIF or any other journal impact indicator as a predictor of the citedness of a particular paper or set of papers. But does this mean, as DORA seems to suggest, that journal impact factors do not make any sense at all? Here I think DORA is wrong. At the level of the journal the improved impact factors do give interesting information about the role and position of the journal, especially if this is combined with qualitative information about the peer review process, an analysis of who is citing the journal and in which context, and its editorial policies. No editor would want to miss the opportunity to use the analysis of its role in the scientific communication process, and journal indicators can play an informative, supporting, role. Also, it makes perfect sense in the context of research evaluation to take into account whether a researcher has been able to publish in a high quality scholarly journal. But journal impact factors should not rule the world.
Moed, H. F., & Van Leeuwen, T. N. (1996). Impact factors can mislead. Nature, 381(6579), 186.
Moed, H., & Leeuwen, T. Van. (1995). Improving the accuracy of Institute for Scientific Information’s journal impact factors. JASIS, 46(6), 461–467. Retrieved from http://www.iem.ac.ru/~kalinich/rus-sci/ISI-CI-IF.pdf
Seglen, P. O. (1997). Why the impact factor of journals should not be used for evaluating research. BMJ (Clinical research ed.), 314(7079), 498–502. Retrieved from http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2126010&tool=pmcentrez&rendertype=abstract
Waltman, L., & Eck, N. van, Leeuwen, & Visser. (2013). Some modifications to the SNIP journal impact indicator. Journal of Informetrics, 1–20. Retrieved from http://www.sciencedirect.com/science/article/pii/S1751157712001010
I would like to thank Thed van Leeuwen and Ludo Waltman for their comments on an earlier draft of this post.
June 7, 2013 at 12:37 PM
The information in this post is highly relevant for those who favour a scientific approach of quality assessment in the sciences. It should however be taken to heart also by authorities and their bookkeepers in the academic world. Most important, researchers themselves should be aware of the pitfalls in the everyday use of the JIF and other indicators, which reflects rather an ideological or managerial approach than a scientific one. In discussions on the workfloor I am often confronted with a lack of knowledge on these matters (and sometimes even with a lack of interest).
June 20, 2013 at 2:55 PM
Reblogged this on jbrittholbrook and commented:
With the release of the new Journal Impact Factors, everyone should read this blog posted by Paul Wouters at “The Citation Culture.”
December 23, 2013 at 4:53 PM
[…] system and are speaking out. See for example the excellent writing by Steven Curry, Bob Lalasz, Paul Wouters, Björn Brembs and Michael […]
May 31, 2014 at 12:10 AM
[…] be a sobering experience. Peer review is broken; publishers are evil; papers are evaluated by the wrong metrics; and the data is probably faked […]