Metrics in research assessment under review

This week the Higher Education Funding Council for England (HEFCE) published a call to gather “views and evidence relating to the use of metrics in research assessment and management” http://www.hefce.ac.uk/news/newsarchive/2014/news87111.html. The council has established an international steering group which will perform an independent review of the role of metrics in research assessment. The review is supposed to contribute to the next installment of the Research Excellence Framework (REF) and will be completed Spring 2015.

Interestingly, two members of the European ACUMEN project http://research-acumen.eu/ are members of the 12 person steering group – Mike Thelwall (professor of cybermetrics at Wolverhampton University http://cybermetrics.wlv.ac.uk/index.html) and myself – and it is led by James Wilsdon, professor of Science and Democracy at the Science Policy Research Unit (SPRU) at the University of Sussex. The London School of Economics scholar Jane Tinkler, co-author of the book The Impact of the Social Sciences, is also member and has put together some reading material on their blog http://blogs.lse.ac.uk/impactofsocialsciences/2014/04/03/reading-list-for-hefcemetrics/. So there will be ample input from the social sciences to analyze both the promises and the pitfalls of using metrics in the British research assessment procedures. The British clearly see this as an important issue. The creation of the steering group was announced by the British minister for universities and science, David Willett at the Universities UK conference on April 3 https://www.gov.uk/government/speeches/contribution-of-uk-universities-to-national-and-local-economic-growth. In addition to science & technology studies experts, the steering group consists of scientists from the most important stakeholders in the British science system.

At CWTS, we responded enthusiastically to the invitation by HEFCE to contribute to this work, because this approach resonates so well with the CWTS research programme http://www.cwts.nl/pdf/cwts_research_programme_2012-2015.pdf. The review will focus on: identifying useful metrics for research assessment; how metrics should be used in research assessment; ‘gaming’ and strategic use of metrics; and the international perspective.

All the important questions about metrics have been put on the table by the steering group, among others:

–       What empirical evidence (qualitative or quantitative) is needed for the evaluation of research, research outputs and career decisions?

–       What metric indicators are useful for the assessment of research outputs, research impacts and research environments?

–       What are the implications of the disciplinary differences in practices and norms of research culture for the use of metrics?

–       What evidence supports the use of metrics as good indicators of research quality?

–       Is there evidence for the move to more open access to the research literature to enable new metrics to be used or enhance the usefulness of existing metrics?

–       What evidence exists around the strategic behaviour of researchers, research managers and publishers responding to specific metrics?

–       Has strategic behaviour invalidated the use of metrics and/or led to unacceptable effects?

–       What are the risks that some groups within the academic community might be disproportionately disadvantaged by the use of metrics for research assessment and management?

–       What can be done to minimise ‘gaming’ and ensure the use of metrics is as objective and fit-for-purpose as possible?

The steering group also calls for evidence on these issues from other countries. If you wish to contribute evidence to the HEFCE review, please make it clear in your response whether you are responding as an individual or on behalf of a group or organisation. Responses should be sent to metrics@hefce.ac.uk by noon on Monday 30 June 2014. The steering group will consider all responses received by this deadline.

 

 

Advertisement

On citation stress and publication pressure

Our article on citation stress and publication pressure in biomedicine went online this week – co-authored with colleagues from the Free University and University Medical Centre Utrecht:

Tijdink, J.K., S. de Rijcke, C.H. Vinkers, Y.M. Smulders, P.F. Wouters, 2014. Publicatiedrang en citatiestress: De invloed van prestatie-indicatoren op wetenschapsbeoefening. Nederlands Tijdschrift voor Geneeskunde 158: A7147.

* Dutch only *

Tales from the field: On the (not so) secret life of performance indicators

* Guest blog post by Alex Rushforth *

In the coming months Sarah De Rijcke and I have been accepted to present at conferences in Valencia and Rotterdam on research from CWTS’s nascent EPIC working group. We very much look forward to drawing on collaborative work from our ongoing ‘Impact of indicators’ project on biomedical research in University Medical Centers (UMC) in the Netherlands. One of our motivations behind the project is that there has been a wealth of social science literature in recent times about the effects of formal evaluation in public sector organisations, including universities. Yet too few studies have taken seriously the presence of indicators in the context of one of the universities core-missions: knowledge creation. Fewer still have looked to take an ethnographic lens to the dynamics of indicators in the day-to-day work context of academic knowledge. These are deficits we hope to begin addressing through these conferences and beyond.

The puzzle we will be addressing here appears – at least at first glance- straightforward enough: what is the role of bibliometric performance indicators in the biomedical knowledge production process? Yet comparing provisional findings from two contrasting case studies of research groups from the same UMC – one a molecular biology group and the other a statistics group – it becomes quickly apparent that there can be no general answer to this question. As such we aim to provide not only an inventory of different ‘roles’ of indicators in these two cases, but also to pose the more interesting analytical question of what conditions and mechanisms explain the observed variations in the roles indicators come to perform?

Owing to their persistent recurrence in the data so far, the indicators we will analyze are journal impact factor, H-index, and ‘advanced’ citation-based bibliometric indicators. It should be stressed that our focus on these particular indicators have have emerged inductively from observing first-hand the metrics that research groups attended to in their knowledge-making activities. So what have we found so far?

Dutch UMCs constitute particularly apt sites through which to explore this problem given how bibliometric assessments have been central to the formal evaluations carried-out since their inception in the early-2000s. On one level it is argued that researchers in both cases encounter such metrics as ‘governance/managerial devices’, that is, as forms of information required of them by external agencies on whom they are reliant for resources and legitimacy. Such examples can be seen when funding applications, annual performance appraisals, or job descriptions demand such information of an individual’s or group’s past performance. As the findings will show, the information needed by the two groups to produce their work effectively and the types of demands made on them by ‘external’ agencies varies considerably, despite their common location in the same UMC. This is one important reason why the role of indicators differs between cases.

However, this coercive ‘power over’ account is but one dimension of a satisfying answer to our role of indicators question. Emerging analysis reveals also the surprising discovery that in fields characterized by particularly integrated forms of coordination and standardization (Whitley, 2000)– like our molecular biologists – indicators in fact have the propensity to function as a core feature of the knowledge making process. For instance, a performance indicator like the journal impact factor was routinely mobilized informally in researchers’ decision-making as an ad hoc standard against which to evaluate the likely uses of information and resources, and in deciding whether time and resources should be spent pursuing them. By contrast in the less centralized and integrated field statistical research such an indicator was not so indispensable to routines of knowledge making activities. In the case of the statisticians it is possible to speculate that indicators are more likely to emerge intermittently as conditions to be met for gaining social and cultural acceptance by external agencies, but are less likely to inform day-to-day decisions. Through our ongoing analysis we aim to unpack further how disciplinary practices interact with organisation of Dutch UMCs to produce quite varying engagements with indicators.

The extent to which indicators play central/peripheral roles in research production processes across academic contexts is an important sociological problem to be posed in order to enhance understanding of the complex role of performance indicators in academic life. We feel much of the existing literature on evaluation of public organisations has tended to paint an exaggerated picture of formal evaluation and research metrics as synonymous with empty ritual and legitimacy (e.g. Dahler-Larsen, 2012). Emerging results here show that – at least in the realm of knowledge production- the picture is more subtle. This theoretical insight will prompt us to suggest further empirical studies are needed of scholarly fields with different patterns of work organisation in order to compare our results and develop middle-range theorizing on the mechanisms through which metrics infiltrate knowledge production processes to fundamental or peripheral degrees. In future this could mean venturing into fields far outside of biomedicine, such as history, literature, or sociology. For now though we look forward to expanding the biomedical project, by conducting analogous case studies from a second UMC.

Indeed it is through such theoretical developments that we can consider not only the appropriateness of one-size-fits-all models of performance evaluation, but also unpack and problematize discourses about what constitutes ‘misuse’ of metrics. And indeed how convinced should we be that academic life is now saturated and dominated by deleterious metric indicators? 

References

DAHLER-LARSEN, P. 2012. The evaluation society, Stanford, California, Stanford Business Books, an imprint of Stanford University Press.

 WHITLEY, R. 2000. The intellectual and social organization of the sciences, Oxford England ; New York, Oxford University Press.

Selling science to Nature

On Saturday 22 December, the Dutch national newspaper NRC published an interview with Hans Clevers, professor of molecular genetics and president of the Royal Netherlands Academy of Arts and Sciences (KNAW). The interview is the latest in a series of public performances following Clevers’ installment as president in 2012, in which he responds to current concerns about the need for revisions in the governance of science. The recent Science in Transition initiative for instance stirred quite some debate in the Netherlands, also within the Academy. One of the most hotly debated issues is that of quality control, an issue that encompasses the implications of an increasing publication pressure, purported flaws in the peer review system, impact factor manipulation, and the need for new forms of data quality management.

Clevers is currently combining the KNAW-presidency with his group leadership at the Hubrecht Institute in Utrecht. In both roles he actively promotes data sharing. He told the NRC that he stimulates his own researchers to share all findings. “Everything is for the entire lab. Asians in particular sometimes need to be scolded for trying to keep things to themselves.” When it comes to publishing the findings, it is Clevers who decides who contributed most to a particular project and who deserves to be first author. “This can be a big deal for the careers of PhD students and post-docs.” The articles for ‘top journals’ like Nature or Science he always writes himself. “I know what the journals expect. It requires great precision. A title consists of 102 characters. It should be spot-on in terms of content, but it should also be exciting.”

Clevers does acknowledge some of the problems with the current governance of science — the issue of data sharing and mistrust mentioned above, but for instance also the systematic imbalance in the academic reward system when it comes to appreciation for teaching. However, he does not seem very concerned with publication pressure. He argued on numerous occasions that publishing is simply part of daily scientific life. According to him, the number of articles is not a leading criterium. In most fields, it’s the quality of the papers that matters most. With these statements Clevers clearly puts himself in the mainstream view on scientific management. But there are also dissenting opinions, and sometimes they are voiced by other prominent scientists from the same field. Last month, Nobel Prize winner Randy Schekman, professor of molecular and cell biology at UC Berkeley, declared a boycott on three top-tier journals at the Nobel Prize ceremony in Stockholm. Schekman argued that NatureCellScience and other “luxury” journals are damaging the scientific process by artificially restricting the number of papers they accept, by make improper use of the journal impact factor as a marketing tool, and by depending on editors that favor spectacular findings over soundness of the results. 

The Guardian published an article in which Schekman iterated his critique. The journal also made an inventory of the reactions of the editors-in-chief of NatureCell and Science. They washed their hands of the matter. Some even delegated the problems to the scientists themselves. Philip Campbell, editor-in-chief of Nature, referred to a recent survey of the Nature Publishing Group which revealed that “[t]he research community tends towards an over-reliance in assessing research by the journal in which it appears, or the impact factor of that journal.”

In a previous blog post we paid attention to a call for an in-depth study of the editorial policies of NatureScience, and Cell by Jos Engelen, president of the Netherlands Organization for Scientific Research (NWO). It is worth reiterating some parts of his argument. According to Engelen the reputation of these journals, published by commercial publishers, is based on ‘selling’ innovative science derived from publicly funded research. Their “extremely selective publishing policy” has turned these journals into ‘brands’ that have ‘selling’ as their primary interest, and not, for example, “promoting the best researchers.” Here we see the contours of a disagreement with Clevers. Without wanting to read too much into his statements, Clevers on more than one occasion treats the status and quality of NatureCell and Science as apparently self-evident — as the main current of thought would have it. But in the NRC interview Clevers also does something else: By explaining his policy to write the ‘top-papers’ himself he also reveals that these papers are as much the result of craft, reputation and access, as they are an ‘essential’ quality of the science behind it. Knowing how to write attractive titles is a start – but it is certainly not the only skill needed in this scientific reputation game.

The stakes are high with regard to scientific publishing  — that much is clear. Articles in ‘top’ journals can make, break or sustain careers. One possible explanation for the status of these journals is of course that researchers have become highly reliant on on external funding for the continuation of their research. And highly cited papers in high impact journals have become the main ‘currency’ in science, as theoretical physicist Jan Zaanen called it in a lecture at our institute. The fact that articles in top journals serve as de facto proxies for the quality of researchers is perhaps not problematic in itself (or is it?). But it certainly becomes tricky if these same journals increasingly treat short-term news-worthiness as an important criterion in their publishing policies, and if peer review committee work also increasingly revolves around selecting those projects that are most likely to have short-term success. Amongst others Frank Miedema (one of the initiators of Science in Transition) argues that this is the case in his booklet Science 3.0. Clearly, there is a need for thorough research into these dynamics. How prevalent are they? And what are the potential consequences for longer-term research agendas?

NWO president Jos Engelen calls for in-depth study of editorial policies of Science and Nature

The Netherlands Organization for Scientific Research (NWO) wants to start an in-depth study of the editorial policies of the most famous scientific journals, such as Science, Nature, Cell, The Lancet, The New England Journal of Medicine, and Brain. NWO president Jos Engelen announced this in a lecture on open access publishing on 11 December in Nijmegen. The lecture was given in the framework of the Honors Program of Radboud University on “Ethos in Science”.

According to Engelen, it is urgent to assess the role of these journals in the communication of scientific knowledge. Engelen wants the scientific system to shift to free dissemination of all scientific results. He sees three reasons for this. First, it is “a moral obligation to grant members of the public free access to scientific results that were obtained through public funding, through taxpayers’ money.” Engelen gets “particularly irritated when I read in my newspaper that new scientific results have been published on, say, sea level rise, to find out that I have to buy the latest issue of Nature Magazine to be properly informed.” Second, scientific knowledge gives a competitive edge to the knowledge economy and should therefore freely flow into society and the private sector. Third, science itself will profit from the free flow of knowledge between fields. “In order to face the ‘grand challenges’ of today scientific disciplines have to cooperate and new disciplines will emerge.”

Engelen wants to investigate the editorial policies of the most famous scientific journals because they stand in the way of open access. These feel no reason to shift their business model to open access, because their position is practically impregnable”. Engelen takes the journal Science, published by the Association for the Advancement of Science as example. “Its reputation is based on an extremely selective publishing policy and its reputation has turned ‘Science’ into a brand that sells”. Engelen remarks that the same is true for Nature, Cell and other journals published by commercial publishers. “Scientific publications are only a part, not even the dominant part of ‘the business’, but the reputation of the journal is entirely based on innovative science emanating from publicly funded research. Conversely, the reputation of scientists is greatly boosted by publications in these top-journals; top-journals with primarily an interest in selling and not in, for example, promoting the best researchers.”

Engelen concludes this part of his lecture on open access with a clear shot across the bow. “It has puzzled me for a while already that national funding organisations are not more critical about the authority that is almost automatically imputed to the (in some cases full time, professional, paid) editors of the top-journals. I think an in depth, objective study of the editorial policies, and the results thereof, commissioned by research funders, is highly desirable and in fact overdue. I intend to take initiatives along this line soon!”

Stick to Your Ribs: Interview with Paula Stephan — Economics, Science, and Doing Better

A good interview about what is wrong with the current incentives system in science and scholarship.

The Persistent Lure of the Impact Factor–Even for PLOS ONE

Bibliometrics of individual researchers

The demand for measures of individual performance in the management of universities and research institutes has been growing, in particular since the early 2000s. The publication of the Hirsch Index in 2005 (Hirsch, 2005) and its popularisation by the journal Nature (Ball, 2005) has given this a strong stimulus. According to Hirsch, his index seemed the perfect indicator to assess the scientific performance of an individual author because “it is transparent, unbiased and very hard to rig”. The h-index balances productivity with citation impact. An author with a h-index of 14 has created 14 publications that each have been cited at least 14 times. So neither authors with a long list of mediocre publications, nor an author with 1 wonder hit are rewarded by this indicator. Nevertheless, the h-index turned out to have too many disadvantages to be wearing the crown of “the perfect indicator”. As Hirsch acknowledged himself, it cannot be used for cross-disciplinary comparison. A field in which many citations are exchanged among authors will produce a much higher average Hirsch index than a field with much less citations and references per publication. Moreover, the older one gets, the higher ones h-index will be. And, as my colleagues have shown, the index is mathematically inconsistent, which means that rankings based on the h-index may be influenced in rather counter-intuitive ways (Waltman & Eck, 2012). At CWTS, we therefore prefer the use of an indicator like the number (or percentage) of highly cited papers instead of the h-index (Bornmann, 2013).

Still, none of the bibliometric indicators can claim to be the perfect indicator to assess the performance of the individual researcher. This raises the question of how bibliometricians and science managers should use statistical information and bibliometric indicators. Should they be avoided and should the judgment of candidates for a prize or a membership of a prestigious academic association only be informed by peer review? Or can numbers play a useful role? What guidance should the bibliometric community then give to users of their information?

This was the key topic at a special plenary at the 14th ISSI Conference two weeks ago in Vienna. The plenary was an initiative taken by Jochen Gläser (Technical University Berlin), Ismael Rafols (SPRU, University of Sussex, and Ingenio, Polytechnical University Valencia), Wolfgang Glänzel (Leuven University) and myself. The plenary aimed to give a new stimulus to the debate how to apply, and how not to apply, performance indicators of individual scientists and scholars. Although not a new debate – the pioneers of bibliometrics already paid attention to this problem – it has become more urgent because of the almost insatiable demand for objective data and indicators in the management of universities and research institutes. For example, many biomedical researchers mention the value of their h-index on their CV. In publications lists, one can regularly see the value of the Journal Impact Factor mentioned after the journal’s name. In some countries, for example Turkey and China, one’s salary can be determined by the value of either the h-index or the journal’s impact factor one has published in. The Royal Netherlands Academy of Arts and Sciences also seems to ask for this kind of statistics in its forms for new members in the medical and natural sciences. Although robust systematical evidence is still lacking (we are working hard on this), the use of performance indicators in the judgment of individual researchers for appointments, funding, and memberships, seems widespread, intransparent and unregulated.

This situation is clearly not desirable. If researchers are being evaluated, they should be aware of the criteria used and these criteria should be justified for the purpose at hand. This requires that users of performance indicators should have clear guidelines. It seems rather obvious that the bibliometric community has an important responsibility to inform and provide such guidelines. However, at the moment, there is no consensus yet about such guidelines. Individual bibliometric centres do indeed inform their clients about the use and limitations of their indicators. Moreover, all bibliometric centres have the habit of publishing their work in the scientific literature, often including technical details of their indicators. However, this published work is not easily accessible to non-expert users such as deans of faculties and research directors. The literature is too technical and distributed over too many journals and books. It needs synthesizing and translation into plain language which is easily understandable.

To initiate a process of a more professional guidance for the application of bibliometric indicators in the evaluation of individual researchers, we asked the organizers of the ISSI conference to devote a plenary to this problem, which they kindly agreed to. At the plenary, Wolfgang Glänzel and me presented “The dos and don’ts in individual level bibliometrics”. We do not think this is a final list, more a good start with ten dos and don’ts. Some examples: “do not reduce individual performance to a single number”, “do not rank scientists according to 1 indicator”, “always combine quantitative and qualitative methods”, “combine bibliometrics with career analysis”. To prevent misunderstandings: we do not want to initiate a bibliometric police with absolute rules. The context of the evaluation should always determine which indicators and methods to use. Therefore, some don’ts in our list may sometimes be perfectly useable, such as the application of bibliometric indicators to make a first selection among a large number of candidates.

Our presentation was commented on by Henk Moed (Elsevier) with a presentation on “Author Level Bibliometrics” and by Gunnar Sivertsen (NIFU, Oslo University) with comments on the basis of his extensive experiences in research evaluation. Henk Moed built on the concept of the multi-dimensional research matrix which was published by the European Expert Group on the Assessment of University Based Research in 2010, of which he was a member (Assessing Europe’s University-Based Research – Expert Group on Assessment of University-Based Research, 2010). This matrix aims to give global guidance to the use of indicators at various levels of the university organization. However, it does not focus on the problem of how to evaluate individual researchers. Still, the matrix is surely a valuable contribution to the development of more professional standards in the application of performance indicators. Gunnar Sivertsen made clear that the discussion should not be restricted to the bibliometric community itself. On the contrary, the main audience of guidelines should be the researchers themselves and adminstrators in universities and funding agencies.

The ensuing debate led to a large number of suggestions. They will be included in the full report of the meeting which will be published in the upcoming issue of the ISSI’s professional newsletter in September 2013. A key point was perhaps the issue of responsibility: it is clear that researchers themselves and the evaluating bodies should carry the main responsibility for the use of performance indicators. However, they should be able to rely on clear guidance from the technical experts. How must this balance be struck? Should bibliometricians refuse to deliver indicators when they think their application would be unjustified? Should the association of scientometricians publicly comment on misapplications? Or should this be left to the judgment of the universities themselves? The plenary did not solve these issues yet. However, a consensus is emerging that more guidance by bibliometricians is required and that researchers should have a clear address to which they can turn to with questions about the application of performance indicators either by themselves or by their evaluators.

What next? The four initiators of this debate in Vienna have also organized a thematic session on individual level bibliometrics at the next conference on science & tecnnology indicators, the STI Conference “Translational twists and turns: science as a socio-economic endeavour”, which will take place in Berlin, 4-6 September 2013. There, we will take the next step in specifying guidelines. In parallel, this conference will also host a plenary session on the topic of bibliometric standards in general, organized by iFQ, CWTS and Science-Metrix. In 2014, we will then organize a discussion with the key stakeholders such as faculty deans, adminstrators, and of course the research communities themselves on the best guidelines for evaluating individual researchers.

Stay tuned.

Bibliography:

Assessing Europe’s University-Based Research – Expert Group on Assessment of University-Based Research. (2010). Research Policy. European Commission. doi:10.2777/80193

Ball, P. (2005). Index aims for fair ranking of scientists. Nature, 436(7053), 900. Retrieved from http://dx.doi.org/10.1038/436900a

Bornmann, L. (2013). A better alternative to the h index. Journal of Informetrics, 7(1), 100. doi:10.1016/j.joi.2012.09.004

Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569–72. doi:10.1073/pnas.0507655102

Waltman, L., & Eck, N. J. Van. (2012). The Inconsistency of the h-index. Journal of the American Society for Information Science and Technology, 63(2007), 406–415. doi:10.1002/asi

Vice Rector University of Vienna calls for a new scientometrics

At the opening of the bi-annual conference of the International Society for Informetrics and Scientometrics (ISSI) in Vienna on July 16, Susanne Weigelin-Schwiedrzik, the Vice Rector of the University of Vienna called upon the participants to reorient the field of scientometrics in order to better meet the need for research performance data. She explained that the Austrian universities nowadays are obliged by law to base all their decision regarding promotion, personnel, research funding and allocation of research funds to departments on formal external evaluation reports. “You are hosted by one of the oldest universities in Europe, it was founded in 1365. In the last couple of years, this prestigious institute has been reorganized using your scientometric data. This puts a tremendous responsibility on your field. You are no longer in the Kindergarten stage. Without your data, we cannot take decisions. We use your data to allocate research funds. We have to think twice before using your data. But you have the responsibility to realize your role in a more fundamental way. You also have to address the criticism of scientometric data. And what they represent.”

Weigelin’s passionate call for a more reflexive and critical type of scientometrics is motivated by the strong shift in Austrian university policy with respect to human resource management and research funding. In the past, the system was basically a closed shop with many university staff members staying within their original university. The system was not very open to exchanges among universities, let alone international exchange. Nowadays, the university managers need to explicitly base their decisions on external evaluations, in order to make clear that their decisions meet international quality standards. As a consequence, the systems of control at Austrian universities have exploded. To support this decision making machinery, the University of Vienna has created a specific quality management department and a bibliometric department. The university has an annual budget 380 million Euro and needs to meet annual targets that are included in target agreements with the government.

On the second day of the ISSI conference, Weigelin repeated her plea in a plenary session on the merits of altmetrics. After a couple of presentations by Elsevier and Mendeley researchers, she said she was “not impressed”. “I do not see how altmetrics, such as download and usage data, can help solve our problem. We need to take decisions on the basis of data on impact. We look at published articles and at Impact Factors. As a researcher, I know that this is incorrect since these indicators do not directly reflect quality. But as a manager, I do not know what to do else. We are supposed to simplify the world of science. That is why we rely on your data and on the misconception that impact is equal to quality. I do not see a solution in altmetrics.” She told the audience, which was listening intently, that she has a constant flow of evaluation reports and the average quality of these reports is declining. “And I must say that a fair amount of the reports that are pretty useless are based on scientometric data.” Nowadays, Weigelin is no longer accepting recommendations for promotion of scientific staff that are only mentioning bibliometric performance measures without a substantive interpretation of what the staff member is actually contributing to her scientific field.

In other words, at the opening of this important scientometric conference, the leadership of the University of Vienna has formulated a clear mission for the field of scientometrics. The task is to be more critical with respect to the interpretation of indicators and to develop new forms of strategically relevant statistical information. This mission resonates strongly with the new research program we have developed at CWTS. Happily, the resonance among the participants of the conference was strong as well. The program of the conference shows many presentations and discussions that promise to at least contribute, albeit sometimes in a modest way, to solving Weigelin’s problems. It seems therefore clear that many scientometricians are eager to meet the challenge and indeed develop a new type of scientometrics for the 21st century.

The evidence on the Journal Impact Factor

The San Francisco Declaration on Research Assessment (DORA), see our most recent blogpost, focuses on the Journal Impact Factor, published in the Web of Science by Thomson Reuters. It is a strong plea to base research assessments of individual researchers, research groups and submitted grant proposals not on journal metrics but on article-based metrics combined with peer review. DORA cites a few scientometric studies to bolster this argument. So what is the evidence we have about the JIF?

In the 1990s, the Norwegian researcher Per Seglen, based at our sister institute the Institute for Studies in Higher Education and Research (NIFU) in Oslo and a number of CWTS researchers (in particular Henk Moed and Thed van Leeuwen) developed a systematic critique of the JIF, its validity as well as the way it is calculated (Moed & Van Leeuwen, 1996; Moed & Leeuwen, 1995; Seglen, 1997). This line of research has since blossomed in a variety of disciplinary contexts, and has identified three main reasons not to use the JIF in research assessments of individuals and research groups.

First, although the values of JIF of a particular journal depend on the aggregated citation rates of the individual articles, the JIF cannot be used as a stand-in for the latter in research assessments. This is because a small number of articles are cited very heavily, while a large number of articles are only cited once in a while, and some are not cited at all. This skweded distribution is a general phenomenon in citation patterns and it holds for all journals. Therefore, if a researcher has published an article in a high impact journal, this does not mean that her particular piece of research will also have a high impact.

Second, fields differ strongly in their usual JIF values. A field with a rapid turn-over of research publications and long reference lists (such as fields in biomedical research) will tend to have much higher JIF values for its journals than a field with short refence lists in which older publications remain relevant much longer (such as fields in mathematics). Moreover, smaller fields will usually have smaller number of journals, resulting in less possibilities to publish in high-impact journals. As a result, it does not make sense to compare JIF across fields. Although virtually everybody knows this, an implicit comparison is often still prevalent. This is for example the case when publications are compared on their JIF values in multi-disciplinary settings (such as in grant proposals reviews).

Third, the way in which the JIF is calculated in the Web of Science has a number of technical characteristics due to which the JIF can be gamed relatively easily by journal editors. The JIF is a division of total number of citations to the journal in the last two years by the number of “citeable publications”. Some publications do not count as “citeable” although they do contribute to the total number of citations if cited. By increasing the relative share of these publications in the journal, the editor can try to artifically increase his JIF value. This can also be accomplished by increasing the number of publications that are more frequently cited, such as review articles, long articles, or clinical trials. Last, the editor can try to convince or pressure submitting authors to cite more publications in the journal itself. All three forms of manipulations are occuring, although we do not really know how frequently this happens. Sometimes, the manipulation is plainly visible. Editors have been writing editorials about their citation impact, citing all publications in the past two years in their own journal, admonishing authors to increase their JIF!

A more generic problem with using the JIF in research assessment is that not all fields have meaningful JIF values, since they are only based on those journals in the Web of Science that have their JIF calculated. Scholarly fields focusing on books or technical designs are disadvantaged in evaluations in which the JIF is important.

In response to these problems, five main journal impact indicators have been developed as an improvement upon, or alternative to, the JIF. First, the CWTS Journal to Field Impact Score (JFIS) indicator improves upon the JIF because it does away with the difference in the numerator and denominator regarding “citeable items” and because it takes field differences in citation density into account. Second, the SCImago Journal Rank (SJR) indicator follows the same logic as Google’s PageRank algorithm: citations from highly cited journals have more influence than citations from lowly cited ones. SCImago, based in Madrid, calculates the SJR not on the basis of the basis of the Web of Science but on the basis of the Scopus citation database (published by Elsevier). A similar logic is applied in two other journal impact factors from the Eigenfactor.org research project, based at the biology department of the University of Washington (Seattle): the Eigenfactor and the Article Influence Score (AIS). These are often calculated on the basis of the Web of Science and use a ‘citation window’ of five years (citations to an article in the previous five years count), whereas this is two years in JIF and three years in SJR.

The fifth journal impact indicator is computed on the basis of Scopus by CWTS: the Source Normalized Impact per Paper indicator (SNIP) (invented by Henk Moed and further developed by Nees Jan van Eck, Thed van Leeuwen, Martijn Visser and Ludo Waltman (Waltman, Eck, Leeuwen, & Visser, 2012)). This indicator also weights citations but not on the basis of the number of citations to the citing journal, but on the basis of the number of references in the citing article. Basically, the citing paper is seen as giving out one vote which is distributed over all cited papers. As a result, a citation from a paper with 10 references adds 1/10th to the citation frequency, whereas a citation from a paper with 100 references adds only 1/100th. The effect is that the SNIP indicator cancels out differences across fields in citation density (though certainly not all relevant differences between disciplines, such as the amount of work that is needed to publish an article). The Eigenfactor also uses this principle in its implementation of the PageRank algorithm.

The improved journal impact indicators do solve a number of problems that have emerged in the use of the JIF. Nevertheless, careless use of the journal impact indicators in research assessments is not justified. All journal impact indicators are in the end based on the number of citations to the individual articles in the journal. The correlation is however too weak to legitimize the application of some journal indicator instead of the assessment of the articles themselves if one wishes to evaluate those articles. Whenever the journal indicators take the differences between fields into account, the number of citations to sets of articles produced by research groups as a whole tend to show a somewhat stronger correlation with the journal indicators. Still, the statistical correlation remains very modest. Research groups tend to publish across a whole range of journals with both high and lower impact factors. It will therefore usually be much more accurate to analyze the influence of these bodies of work rather than fall back on the journal indicators.

To sum up, the bibliometric evidence confirms the main thrust of DORA: it is not sensible to use the JIF or any other journal impact indicator as a predictor of the citedness of a particular paper or set of papers. But does this mean, as DORA seems to suggest, that journal impact factors do not make any sense at all? Here I think DORA is wrong. At the level of the journal the improved impact factors do give interesting information about the role and position of the journal, especially if this is combined with qualitative information about the peer review process, an analysis of who is citing the journal and in which context, and its editorial policies. No editor would want to miss the opportunity to use the analysis of its role in the scientific communication process, and journal indicators can play an informative, supporting, role. Also, it makes perfect sense in the context of research evaluation to take into account whether a researcher has been able to publish in a high quality scholarly journal. But journal impact factors should not rule the world.

Literature:

Moed, H. F., & Van Leeuwen, T. N. (1996). Impact factors can mislead. Nature, 381(6579), 186.

Moed, H., & Leeuwen, T. Van. (1995). Improving the accuracy of Institute for Scientific Information’s journal impact factors. JASIS, 46(6), 461–467. Retrieved from http://www.iem.ac.ru/~kalinich/rus-sci/ISI-CI-IF.pdf

Seglen, P. O. (1997). Why the impact factor of journals should not be used for evaluating research. BMJ (Clinical research ed.), 314(7079), 498–502. Retrieved from http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2126010&tool=pmcentrez&rendertype=abstract

Waltman, L., & Eck, N. van, Leeuwen, & Visser. (2013). Some modifications to the SNIP journal impact indicator. Journal of Informetrics, 1–20. Retrieved from http://www.sciencedirect.com/science/article/pii/S1751157712001010

Acknowledgement:

I would like to thank Thed van Leeuwen and Ludo Waltman for their comments on an earlier draft of this post.

%d bloggers like this: