The evidence on the Journal Impact Factor

The San Francisco Declaration on Research Assessment (DORA), see our most recent blogpost, focuses on the Journal Impact Factor, published in the Web of Science by Thomson Reuters. It is a strong plea to base research assessments of individual researchers, research groups and submitted grant proposals not on journal metrics but on article-based metrics combined with peer review. DORA cites a few scientometric studies to bolster this argument. So what is the evidence we have about the JIF?

In the 1990s, the Norwegian researcher Per Seglen, based at our sister institute the Institute for Studies in Higher Education and Research (NIFU) in Oslo and a number of CWTS researchers (in particular Henk Moed and Thed van Leeuwen) developed a systematic critique of the JIF, its validity as well as the way it is calculated (Moed & Van Leeuwen, 1996; Moed & Leeuwen, 1995; Seglen, 1997). This line of research has since blossomed in a variety of disciplinary contexts, and has identified three main reasons not to use the JIF in research assessments of individuals and research groups.

First, although the values of JIF of a particular journal depend on the aggregated citation rates of the individual articles, the JIF cannot be used as a stand-in for the latter in research assessments. This is because a small number of articles are cited very heavily, while a large number of articles are only cited once in a while, and some are not cited at all. This skweded distribution is a general phenomenon in citation patterns and it holds for all journals. Therefore, if a researcher has published an article in a high impact journal, this does not mean that her particular piece of research will also have a high impact.

Second, fields differ strongly in their usual JIF values. A field with a rapid turn-over of research publications and long reference lists (such as fields in biomedical research) will tend to have much higher JIF values for its journals than a field with short refence lists in which older publications remain relevant much longer (such as fields in mathematics). Moreover, smaller fields will usually have smaller number of journals, resulting in less possibilities to publish in high-impact journals. As a result, it does not make sense to compare JIF across fields. Although virtually everybody knows this, an implicit comparison is often still prevalent. This is for example the case when publications are compared on their JIF values in multi-disciplinary settings (such as in grant proposals reviews).

Third, the way in which the JIF is calculated in the Web of Science has a number of technical characteristics due to which the JIF can be gamed relatively easily by journal editors. The JIF is a division of total number of citations to the journal in the last two years by the number of “citeable publications”. Some publications do not count as “citeable” although they do contribute to the total number of citations if cited. By increasing the relative share of these publications in the journal, the editor can try to artifically increase his JIF value. This can also be accomplished by increasing the number of publications that are more frequently cited, such as review articles, long articles, or clinical trials. Last, the editor can try to convince or pressure submitting authors to cite more publications in the journal itself. All three forms of manipulations are occuring, although we do not really know how frequently this happens. Sometimes, the manipulation is plainly visible. Editors have been writing editorials about their citation impact, citing all publications in the past two years in their own journal, admonishing authors to increase their JIF!

A more generic problem with using the JIF in research assessment is that not all fields have meaningful JIF values, since they are only based on those journals in the Web of Science that have their JIF calculated. Scholarly fields focusing on books or technical designs are disadvantaged in evaluations in which the JIF is important.

In response to these problems, five main journal impact indicators have been developed as an improvement upon, or alternative to, the JIF. First, the CWTS Journal to Field Impact Score (JFIS) indicator improves upon the JIF because it does away with the difference in the numerator and denominator regarding “citeable items” and because it takes field differences in citation density into account. Second, the SCImago Journal Rank (SJR) indicator follows the same logic as Google’s PageRank algorithm: citations from highly cited journals have more influence than citations from lowly cited ones. SCImago, based in Madrid, calculates the SJR not on the basis of the basis of the Web of Science but on the basis of the Scopus citation database (published by Elsevier). A similar logic is applied in two other journal impact factors from the Eigenfactor.org research project, based at the biology department of the University of Washington (Seattle): the Eigenfactor and the Article Influence Score (AIS). These are often calculated on the basis of the Web of Science and use a ‘citation window’ of five years (citations to an article in the previous five years count), whereas this is two years in JIF and three years in SJR.

The fifth journal impact indicator is computed on the basis of Scopus by CWTS: the Source Normalized Impact per Paper indicator (SNIP) (invented by Henk Moed and further developed by Nees Jan van Eck, Thed van Leeuwen, Martijn Visser and Ludo Waltman (Waltman, Eck, Leeuwen, & Visser, 2012)). This indicator also weights citations but not on the basis of the number of citations to the citing journal, but on the basis of the number of references in the citing article. Basically, the citing paper is seen as giving out one vote which is distributed over all cited papers. As a result, a citation from a paper with 10 references adds 1/10th to the citation frequency, whereas a citation from a paper with 100 references adds only 1/100th. The effect is that the SNIP indicator cancels out differences across fields in citation density (though certainly not all relevant differences between disciplines, such as the amount of work that is needed to publish an article). The Eigenfactor also uses this principle in its implementation of the PageRank algorithm.

The improved journal impact indicators do solve a number of problems that have emerged in the use of the JIF. Nevertheless, careless use of the journal impact indicators in research assessments is not justified. All journal impact indicators are in the end based on the number of citations to the individual articles in the journal. The correlation is however too weak to legitimize the application of some journal indicator instead of the assessment of the articles themselves if one wishes to evaluate those articles. Whenever the journal indicators take the differences between fields into account, the number of citations to sets of articles produced by research groups as a whole tend to show a somewhat stronger correlation with the journal indicators. Still, the statistical correlation remains very modest. Research groups tend to publish across a whole range of journals with both high and lower impact factors. It will therefore usually be much more accurate to analyze the influence of these bodies of work rather than fall back on the journal indicators.

To sum up, the bibliometric evidence confirms the main thrust of DORA: it is not sensible to use the JIF or any other journal impact indicator as a predictor of the citedness of a particular paper or set of papers. But does this mean, as DORA seems to suggest, that journal impact factors do not make any sense at all? Here I think DORA is wrong. At the level of the journal the improved impact factors do give interesting information about the role and position of the journal, especially if this is combined with qualitative information about the peer review process, an analysis of who is citing the journal and in which context, and its editorial policies. No editor would want to miss the opportunity to use the analysis of its role in the scientific communication process, and journal indicators can play an informative, supporting, role. Also, it makes perfect sense in the context of research evaluation to take into account whether a researcher has been able to publish in a high quality scholarly journal. But journal impact factors should not rule the world.

Literature:

Moed, H. F., & Van Leeuwen, T. N. (1996). Impact factors can mislead. Nature, 381(6579), 186.

Moed, H., & Leeuwen, T. Van. (1995). Improving the accuracy of Institute for Scientific Information’s journal impact factors. JASIS, 46(6), 461–467. Retrieved from http://www.iem.ac.ru/~kalinich/rus-sci/ISI-CI-IF.pdf

Seglen, P. O. (1997). Why the impact factor of journals should not be used for evaluating research. BMJ (Clinical research ed.), 314(7079), 498–502. Retrieved from http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2126010&tool=pmcentrez&rendertype=abstract

Waltman, L., & Eck, N. van, Leeuwen, & Visser. (2013). Some modifications to the SNIP journal impact indicator. Journal of Informetrics, 1–20. Retrieved from http://www.sciencedirect.com/science/article/pii/S1751157712001010

Acknowledgement:

I would like to thank Thed van Leeuwen and Ludo Waltman for their comments on an earlier draft of this post.

DORA – a stimulus for a new evaluation culture in science

We should urgently improve the ways in which the output of research is assessed by universities and funding agencies. Therefore, the dominance of the Journal Impact Factor in these evaluations should be terminated. This is the gist of a call published a week ago by a large group of prominent researchers and research institutes, the San Francisco Declaration on Research Assessment (DORA). This new initiative started at a conference in San Francisco last December, organized by the American Society for Cell Biology. This origin shows in the list and in the accompanying editorials. The declaration went live together with an editorial in Science and journals in the life sciences, such as EMBO journal, Molecular Biology of the Cell, eLife, and Traffic.  At the moment of writing, more than a thousand individual researchers have signed the declaration as well as over a hundred scientific institutions. Among them are AAAS, Wellcome Trust, EMBO, HEFCE, PNAS, PLOS and the Open Knowledge Foundation.

DORA has mostly been welcomed by experts in scientometrics and bibliometrics, science policy and leaders of academic institutions, and rightly so. This is not because they are declared enemies of the Journal Impact Factor, but because of the narrowmindedness of assessment systems centered around one indicator, which by definition can only capture a narrow slice of relevant dimensions in the assessment of scientific performance. DORA focuses on JIF, produced by Thomson Reuters in their Journal Citation Reports, but some of the arguments also hold for performance indicators in general. The strength of DORA is its plea for the recognition of the diversity of types of scientific output. This should be met by a diversity of measures, both qualitative and quantitative. Moreover, the increasingly web based style of working in science and scholarship enables more advanced and refined forms of measures of production, impact, and influence than the often rather crude approximation in indicators such as JIF (but this depends on what one wants to measure!).

DORA cites the critique of JIF as it has been developed in the decades of bibliometric and science policy research since the early 1990s. The main problems mentioned are strong varation of JIF values across fields due to which it does not make sense to compare JIF values in different fields or even sub-fields; the skewed distribution of the number of citations over the articles within a journal, due to which one cannot see the average as correlated to the prospective citation scores of an article; and the relatively easy ways in which JIF can be gamed by journal editors. This body of research is fairly well summarized, albeit not cited in a comprehensive way.

The main weaknesses of DORA show in the specific recommendations and in some confusion with respect to specific problems of JIF and more generic problems of performance indicators. For example, DORA seems to want to do away entirely with journal based indicators while it recommends additional journal indicators at the same time. (More on this in a next post.)

Yet, the main thrust of DORA is in line with the need to correct for, or warn against, too much reliance on formalized indicators in a lot of universities and institutes. This may have developed at the expense of a well-balanced form of informed peer review, although we also should not underestimate the large number of very well-designed evaluation work that is being conducted every day. Of course, peer review itself must also be kept honest by, among others, well-developed indicators of a variety of dimensions of the process of knowledge creation (such as network positions and gender relationships).

Last year, CWTS published its new research program. One of the main themes is precisely the urgent need to innovate the current systems of research assessment and the related need to support this with a new research agenda in scientometrics. (More on this in a next post). Also, at CWTS we are coordinating the European research project ACUMEN, which aims to support researchers in their evaluation moments by a portfolio of qualitative and quantitative evidence which is valid and reliable at the level of the individual researcher. This project is a large-scale collaboration with a host of scientometric, webometric and science policy experts and researchers. And we know that many of our colleagues are thinking along the same lines. So it should definitely be possible to build a strong coalition in favor of evaluation practices that are more conducive to the further development of science and creativity.

Next post: a summary of the evidence on JIF

Worldwide diversification of research continues

Last Wednesday, we published the new edition of the Leiden Ranking. The results are quite interesting. The range of countries with universities who score high on their number of highly cited publications is increasing. Thirteen countries are now listed in the top hundred of the world: the US (57 universities), UK (16), Switzerland and the Netherlands (each 6), China (4), Singapore, Canada and Germany (each 2), and Israel, Denmark, Ireland, South Korea and Australia (each with 1 university).

Clearly, the US is still dominating. The first 12 universities are all based in the US. Like last year, MIT is leading the ranking with no less than one quarter of its publications in the 10% most cited percentiles of their field (in this calculation, we also take into account the publication year). The largest research university in the world, Harvard, is number five with an impressive one-fifth of its papers published between 2008 and 2011 scoring in the 10% most cited papers of their field. Note that when the option “fractional counting” is vinked, a paper is attributed as an equal fraction of a paper to all universities mentioned as author address. This prevents double counting, but does not reflect the total number of papers originating from a university. For example, Harvard has produced almost 57,000 papers, but many of them with other universities, which results in a “fractionalized” number of almost 30,000 papers, of which one-fifth scores in the 10% most cited segment.

China is steadily increasing the impact of its research. Whereas in the recent past, China rose quickly in terms of the production of scientific papers but not so much in terms of scientific influence, we now see that research from Chinese universities is gaining citations. Two Chinese universities, Nankai and Hunan, are even scoring higher on the highly cited indicator than the highest ranking Dutch universities (Leiden University and Utrecht University). Almost 14.5% of their publications belong to the top 10% most cited in their field. The diversification also shows outside of the top 100 universities. For example, China has 37 universities in the Leiden Ranking 2013 (of which 6 are newcomers), Iran (all five are new), Brazil (10, 2 newcomers). This trend is the result of three effects. First, many universities are increasing their share of the scientific production. Second, at the same time, the number of scientific papers is rising as such, which results in a steady increase of the size of the Web of Science database, on which the Leiden Ranking is based. Third, we have become better in correctly identifying universities in the address field of the scientific publications. We suspect, for example, that this contributes to the rise of Iran in the Leiden Ranking.

Of course, the ranking also shows areas in which the citation impact is lower than expected. What struck me is that the Japanese universities (including the prestigious Tokyo University) all score lower than the world average. This is also true for all universities from some of the newcomers such as Iran. But also, somewhat more surprisingly, for Norway, Brazil, Poland, Italy, Greece, Portugal, Russia, Turkey, and Taiwan.

Why do neoliberal universities play the numbers game?

Performance measurement has brought on a crisis in academia. At least, that’s what Roger Burrows (Goldsmiths, University of London) claims in a recent article for The Sociological Review. According to Burrows, academics are at great risk of becoming overwhelmed by a ‘deep, affective, somatic crisis’. This crisis is brought on by the ‘cultural flattening of market economic imperatives’ that fires up increasingly convoluted systems of measure. Burrows places this emergence of quantified control in academia within the broader context of neoliberalism. Though this has been argued before, Burrows gives the discussion a theoretical twist. He does so by drawing on Gane’s (2012) analysis of Foucault’s (1978-1979) lectures on the relation between market and state under neoliberalism. According to Foucault, neoliberal states can only guarantee the freedom of markets when they apply the same ‘market logic’ on themselves. In this view, the standard depiction of neoliberalism as passive statecraft is not correct. This type of management is not ‘laissez-faire’, but actively stimulates competition and privatization strategies.

In the UK, Burrows contends, the simulation of neoliberal markets in academia has largely been channelled through the introduction of audit and of performance measures. He argues that these control mechanisms become autonomous entities that are increasingly used outside the original context of evaluations, and get a much more active role in shaping the everyday work of academics. According to Burrows, neoliberal universities provide fertile ground for a “co-construction of statistical metrics and social practices within the academy.” Among other things, this leads to a reification of individual performance measures such as the H-index. Burrows:

“[I]t is not the conceptualization, reliability, validity or any other set of methodological concerns that really matter. The index has become reified; (…) a number that has become a rhetorical device with which the neoliberal academy has come to enact ‘academic value’.” (p. 361)

Interestingly, Burrow’s line of reasoning can in some respects itself be seen as a resultant of a broader neoliberal context. Neoliberal policies applaud personal autonomy and the individual’s responsibility for one’s own well-being and professional success. Burrows directly addresses fellow-academics (‘we need to obtain critical distance’; ‘we need to understand ourselves as academics’; ‘why do we feel the way we do?’) and concludes that we are all implicated in the ‘autonomization of metric assemblages’ in the academy. Arguably, it is exactly this neoliberal political climate that justifies Burrows’ focus on individual academics’ affective states. With it comes a delegation of responsibility to the level of the individual researchers. It is our own choice if we comply with the metricization of academia. It is our own choice if we decide to work long hours, spend our weekends writing grant proposals and articles and grading students’ exams. According to Gill (2010), academics tend to justify working so hard because they possess a passionate drive for self-expression and pleasure in intellectual work. Paradoxically, Gill argues, it is this drive that feeds a whole range of disciplinary mechanisms and that lets academics internalize a neoliberal subjectivity. We play ‘the numbers game’, as Burrows calls it, because of “a deep love for the ‘myth’ of what we thought being an intellectual would be like.” (p. 15)

Though Burrows raises concerns that are shared by many academics, it is unfortunate that he does not substantiate his claims with empirical data. Apart from own experience and anecdotal evidence, how do we know that today’s researchers experience the metricization of academia as a ‘deep, affective somatic crisis’? Does it apply to all researchers, is it the same everywhere, and does it hold for all disciplines? These are empirical questions that Burrows does not answer. That said, there is a great need for the types of analyses Burrows and Gill provide, analyses that assess, situate and historicize academic audit cultures. It is not a coincidence that Burrows’ polemic piece emerges from the field of sociology. The social sciences and humanities are increasingly confronted with what Burrows calls the ‘rethoric of accountability’. It has become a commonplace to argue that they, too, should be held accountable for the taxpayers’ money that is being spent on them. These disciplines, too, should be made auditable by way of standardized, transparent performance measures. I agree with Burrows that this rethoric should be problematized. In large parts of these fields it is not at all clear how performance should be ‘measured’ in the first place, for example because of differences in publication cultures within these fields and as compared to the natural sciences. And it is precisely because the discussion is ongoing that we are allowed a clear view of the performative effects of a very specific and increasingly dominant evaluation culture that is not modelled by and on these disciplines. What are the consequences? And are there more constructive alternatives?

Plea for assessments, against bean counting – part 3

Valorization of research has become an increasingly important pillar in research evaluation. The LERU report “Research universities and research assessment” does acknowledge this development. The report does not take a strong stand but limits itself to a cautious preliminary assessment of impact assessment. It gives an overview of the British, US, and European approach to “impact” as evaluation criterion. In the UK, one fifth of the grade in the new Research Excellence Framework will be awarded based on a combination of the “reach” of the impact and its “significance”. Universities are asked to present case studies as empirical evidence of societal impact of their research. The LERU report points to the resource intensiveness of this approach as well as to the novelty of this type of measurement for academia. Panel members will have to be develop expertise in this area. Also, the way research may have wider impact in society will vary strongly by research field.

In the US a different, large-scale data oriented route has been taken with the STAR METRICS project funded by NIH, NSF and the White House Office of Science and Technology. There is no lack of ambition for the US project. According to Francis Collins, Director of NIH, STAR METRICS will “yield a rigorous, transparent review of how our science investments are performing. In the short term, we’ll know the impact on jobs. In the long term, we’ll be able to measure patents, publications, citations, and business start-ups”. LERU warns that this might be too optimistic. “Already anecdotal evidence suggests that a number of anomalies appear to be occurring. There is concern about coverage especially in disciplines that focus on highly selective and tightly focused conference proceedings, traditional journals being deemed to slow. In addition, it is thought that there may be perverse effects on young new investigators.”

Not mentioned in the report is the role of commercial companies in research assessment. This is a growing market and the increasing pressure on university budgets has the paradoxical effect of making research assessments and bibliometric analyses even more important. As a result, commercial companies have developed aggressive strategies to attract universities as clients. Some universities have developed spin-off companies, and CWTS itself is in fact an exemplar of such a hybrid of a research centre and commercial service provider. This has been the state of affairs from the very beginning of scientometrics as a field of research. So there is nothing new here. Still, universities need to be aware of  potential conflicts of interest between the companies producing information about research and themselves. A good strategy might be to always maintain ownership of the data produced by the university and to promote open access where possible. Universities are starting to develop campus-wide policies and they might have profited from LERU advice on this topic.

Last, but not least, the LERU report does not discuss the changing demographics of the research population at universities and the acute need for universities to develop a more future oriented career policy. According to many specialists, the way universities develop their human resource management might very well decide how they will fare. An important question is how research evaluations are affecting the development of research careers and to what extent they are producing perverse effects. The fact that this is not mentioned at all in the LERU report is a missed opportunity in an otherwise balanced and carefully written policy report.

Plea for assessments, against bean counting – part 2

The LERU report “Research universities and research assessment” is partly inspired by problems that university managers have encountered in their attempt to evaluate the performance of their institute. In her presentation at the launch event of the report, Mary Phillips concluded that universities use assessments in a variety of ways. First, they want to know their output, impact and quality for the allocation of funds, performance improvement, and maximization of ‘return on investment’. Second, research assessments are used to inform strategic planning. Third, they are applied to identify excellent researchers in the context of attracting and retaining them. Fourth, they are used to monitor the performance of individual units in the university, such as departments or faculties. Fifth, research assessments are used to identify current and potential partners for scientific collaboration. And last, they are used at the level of the universities to benchmark against their peers.

Given this variety of assessment applications, it is not surprising that universities encounter a number of problems. The report identifies a number of them. The relevant data are diverse and currently often not integrated in tools. There is a lack of agreement on definitons and standards. For example, who counts as a ‘researcher’ may differ in different university systems. Also, the funding mechanisms are still dominantly national and they differ significantly. And finally, many databases that are used in research assessments are proprietary and cannot be controlled by the universities themselves. Morover, Phillips signals that perverse effects can be expected from current assessment procedures. Measurement cultures may “distract from the academic mission”. It is important to be aware of disciplinary differences, for example with respect to the numbers of citations and the relevant time frames of the measurement. Last, the report mentions that academics may feel threatened by research assessments.

In addition to these problems and dangers, the report identifies two important novel developments in research assessment: the European project to rank universities in multiple dimensions (U-Multirank), and the recent emphasis on the societal impact of research in evaluations. The report is rather critical of U-Multirank. The project, in which CWTS participates, aims to address a major problem in current university rankings. Apart from the research focused rankings, such as the Leiden Ranking or the Scimago Ranking, global rankings have combined different dimensions such as the quality of education and research output in an arbitrary way. Also, they apply one model to all universities. However, universities may have very different missions. Therefore, it makes more sense to compare universities with similar missions. “According to the multidimensional approach a focused ranking does not collapse all dimensions into one rank, but will instead provide a fair picture of institutions (‘zooming in’) within the multi-dimensional context provided by the full set of dimensions.” (U-Multirank) In principle, LERU supports this approach and it was also involved in the first stage feasibility study. However, a number of concerns have led LERU to disengage from the project.

“Our main concerns relate to the lack of good or relevant data in several dimensions, the problems of comparability between countries in areas such as funding, the fact that U-Multirank will not attempt to evaluate the data collected, i.e. there will be no “reality-checks”, and last but by no means least, the enormous burden put upon universities in collecting the data, resulting in a lack of involvement from a good mix of different types of universities from all over the world, which renders the resulting analyses and comparisons suspect.” It has led the organization to turn away from rankings as an instrument in assessment. The European Commission has not followed this reasoning and has recently decided to publish a call for the second stage of the U-Multirank project. The consortium has not yet publicly replied to LERU’s critique.

Plea for assessments, against bean counting – part 1

“Above all, universities should stand firm in defending the long-term value of their research activity, which is not easy to assess in a culture where return on investment is measured in very short time spans.” This is the main motif of a new position paper recently published by the League of Research Universities (LERU) about the way universities should handle evaluation of research. In many ways, it is a sensible report which tries to strike a careful balance between the different interests involved. The report is written by Mary Phillips, former director of Research Planning at University College London and currently adviser of Academic Analytics, a for-profit consultancy in the area of research evaluation (and hence one of CWTS’ competitors). The report is a plea for the combined application of peer review and bibliometrics by university management. It also contains a number of principles that LERU would like to see implemented by universities in their assessment procedures.

Point of departure of the report is the observation that assessments have become part and parcel of the university. At the same time, the types of assessments possible and the different methodologies have exploded. This leads to the stimulation of “aobsession with measurement and monitoring, wich may result in a ‘bean counting’ culture detracting from the real quality of research”. Indeed, this has already begun. The dilemmas are made worse by the fact that universities need to deal with large quantities of data, require sophisticated human resource and research management tools, which they often currently lack. On top of all this, funding regimes tend to create incentives which may tempt universities to, as the report with feeling for understatements expresses, “behave in certain ways, sometimes with unfortunate consequences”.

One of the implications is that any assessment system must be sensitive to possible perverse incentives, should take disciplinary differences into account and have a long enough time frame, at least five years according to the report. Assessments should “reflect the reality of research”, including the aspirations of the researchers involved. “Thus, senior administrators and academics must take account of the views of those “at the coal-face” of research”. Assessments should be “as transparent as possible”. Universities are advised to improve their data management systems. And researchers “should be encouraged (or compelled) when publishing, to use a unique personal and institutional designation, and to deposit all publications into the university’s publications  database”.

Universities demand full transparency university rankings

Last week, I attended a two day conference of the rectors of 65 Latin American universities about global university rankings in Mexico City. The meeting concluded by adopting a “Final Declaration” signed by the majority of attending universities. At times, it was a debate in which the emotions ran high. Clearly, many universities leaders had the feeling that they were badly served by most global university rankings. In this, they were supported by the keynote speaker, Simon Marginson, a higher education expert from the University of Melbourne (Australia). He held an excellent speech in which he showed how most rankings are based on a particular model of higher education as a globalized market. In this framework US universities are dominant. Many rectors were of the opinion that the social mission of the Latin American universities will not be valued in this model. Moreover, performance at the international research front is dominant in most rankings, including in our Leiden Ranking. Latin American universities do not score high, if they make it to the ranking at all.

The meeting was organized by Imanol Ordorika, director of institutional evaluation of the National Autonomous University of Mexico. A former leader of the 1987 student demonstrations, he is focused both on international research (in the field of higher education) and on the social role of the universities. The countries in Latin America are confronted with high levels of corruption, enormous economic and social inequalities, and the need for much better mass education. Although these universities are huge (UNAM has more than 300 thousand students), they still cannot accomodate all young people who aspire to study. Approximately one-fifth of Latin America’s youth neither studies nor works. No wonder that university rectors not only worry about their international research effort, but at least as much or more about their role in improving the educational system in their countries.

Against this background, the well-known deficiencies of many global university rankings are even more urgent. This was also the reason to organize the conference. Increasingly, universities that score low or not at all in the rankings – such as the Times Higher Education World University Rankings, de QS World University Rankings, or the Academic Ranking of World Universities (the Shanghai Ranking), or the Ranking Web of World Universities – are questioned about their performance. According to the declaration adopted at the conference, the current rankings have many undesirable effects, such as a homogenizing impact in which the elite US based research university is dominant, a bias in the perception of the performance of Latin American  universities, an undermining of the legitimacy of the national higher education institutions, and the mistaken tendency to see rankings as information systems.

Key problems in the global rankings discussed were: the arbitrary way in which different indicators are combined into one composite indicator; the lack of visibility of the humanities and social sciences; the neglect of the social and cultural impact of the universities; and last but not least the lack of transparency of both methods and data that are used to calculate the indicators. The Leiden Ranking was praised for its transparency and its focus, as was the SCImago Ranking. It was seen as helpful that these rankings make very explicit what they measure and what they do not measure. Of course, these rankings do not enable to compare the universities social mission. For this other measures are needed.

The “Final Declaration” demanded that governments in Latin America avoid using the rankings as elements in evaluating the universities performance. They were also advised to encourage the creation of public databases that permits a well-founded knowledge of the performance of the higher education system. The ranking producers were called upon to adhere to the 2006 “Berlin Principles on Ranking of Higher Education Institutions”. Rankings should be 100% transparent. Ranking producers should also engage in more interaction with the universities. The declaration notes that there is currently no consensus on criteria for measuring the quality of universities. “Any selection of parameters or quantitative indicators to sum up the qualities of universities is rather arbitrary”. The media are admonished to provide a more balanced coverage of the rankings. And the universities in Latin America are encouraged to adopt policies that promote transparency, accountability and open access. Rankings can play a role here. However, universities should not sacrifice “our fundamental responsibilities” in order to implement “superficial strategies designed to improve our standings in the rankings”.

Paul Wouters

On organizational responses to rankings

From 13-15 September 2012, the departments of Sociology and of Anthropology at Goldsmiths are hosting an interdisciplinary conference on ‘practicing comparisons’. Here’s the call for papers. We submitted the following abstract, together with Roland Bal and Iris Wallenburg (Institute of Health Policy and Management, Erasmus University). This cooperation is part of a new line of research on the impacts of evaluation processes on knowledge production.

“Comparing comparisons. On rankings and accounting in hospitals and academia

Not much research has been done as of yet on the ways in which rankings affect academic and hospital performance. The little evidence there is focuses on the university sector. Here, an interest in rankings is driven by a competition in which universities are being made comparable on the basis of ‘impact’. The rise of performance based funding schemes is one of the driving forces. Some studies suggest that shrinking governmental research funding from the 1980s onward has resulted in “academic capitalism” (cf. Slaughter & Lesly 1997). By now, universities have set up special organizational units and have devised specific policy measures in response to ranking systems. Recent studies point to the normalizing and disciplining powers associated with rankings and to ‘reputational risk’ as explanations for organizational change (Espeland & Sauder 2007; Power et al. 2009; Sauder & Espeland 2009). Similar claims have been made for the hospital sector in relation to the effects of benchmarks (Triantafillou 2007). Here, too, we witness a growing emphasis on ‘reputation management’, and on the use of rankings in quality assessment policies.

The modest empirical research done thus far mainly focuses on higher management levels and/or on large institutional infrastructures. Instead, we propose to analyze hospital and university responses to rankings from a whole-organization perspective. Our work zooms in on so-called composite performance indicators that combine many underlying specific indicators (e.g. patient experiences, outcome, and process and structure indicators in the hospital setting, and citation impact, international outlook, and teaching in university rankings). Among other things, we are interested in the kinds of ordering mechanisms (Felt 2009) that rankings bring about on multiple organizational levels – ranging from the managers’ office and the offices of coding staff to the lab benches and hospital beds.

In the paper, we first of all analyze how rankings contribute to making organizations auditable and comparable. Secondly, we focus on how rankings translate, purify, and simplify heterogeneity into an ordered list of comparable units, and on the kinds of realities that are enacted through these rankings. Thirdly, and drawing on recent empirical philosophical and anthropological work (Mol 2002, 2011; Strathern 2000, 2011; Verran 2011), we ask how we as analysts ‘practice comparison’ in our attempt to make hospital and university rankings comparable.”

Journal ranking biased against interdisciplinary research

The widespread use of rankings of journals in research institutes and universities creates a disadvantage for interdisciplinary research in assessment exercises such as the British Research Excellence Framework. This is the conclusion of a paper presented at the 2011 Annual Conference of the Society for the Social Studies of Science in Cleveland (US) by Ismael Rafols (SPRU, Sussex University), Loet Leydesdorff (University of Amsterdam) and Alice O’Hare, Paul Nightingale and Andy Stirling (all SPRU, Sussex University). The study is the first quantitative proof that researchers working at the boundaries between different research fields may be disadvantaged compared with monodisciplinary colleagues. The study argues that citation analysis, if properly applied, is a better measurement instrument than a ranked journal list.

The study is quite relevant for research management at universities and research institutes. Journal lists have become a very popular management tool. In a lot of departments, researchers are obliged to publish in a limited set of journals. Some departments, for example in economics, have even been reorganized on the basis of having published in such a list. The way these lists have been composed does vary. Sometimes a group of experts decides whether a journal belongs to the list, sometimes the Journal Impact Factor published by ISI/Thomson Reuters is the determining factor.

The study by Rafols et al. has analyzed one such list: the ranked journal list used by the British Association of Business Schools. This list is based on a mix of citation statistics and peer review. It ranks scholarly journals in business and management studies in five categories. “Modest standard journals” are category 1, “world elite journals” are category 4*. This scheme reflects the experience researchers have with the Research Assessment Exercise categories. The ranked journal list is meant to be used widely for a variety of management goals. It is used as an advice for researchers about the best venue for their manuscripts. Libraries are supposed to use it in their acquisition policies. And last but not least, it is used in research assessments and personnel evaluations. Although the actual use of the list is an interesting research topic in itself, we can safely assume that it has had a serious impact on the researchers in the British business schools community.

The study shows first of all that the position of a journal in the ranked list correlates negatively with the extent of interdisciplinarity of the journal. In other words, the higher the ranking, the more narrow its disciplinary focus. (The study has used a number of indicators for interdisciplinarity by which different aspects of what it means to be interdisciplinary have been captured.) Rewarding researchers to publish first of all in the ranked journal list may therefore discourage interdisciplinary work.

The study confirms this effect by comparing business and management studies to innovation studies. Both fields are subjected to the same evaluation regime in the Research Excellence Framework. Intellectually, they are very close. However, they differ markedly with respect to their interdisciplinary nature. Researchers in business schools have a more traditional publishing behaviour than their innovation studies colleagues. The research units in innnovation studies are consistently more interdisciplinary than the business and management schools.

Of course, publication behaviour is shaped by a variety of influences. Peer review may be biased against interdisciplinary work because it is more difficult to assess its quality. Many top journals are not eager to publish interdisciplinary work. This study is the first to show convincingly that these already existing biases tend to be made even stronger by the use of ranked journal lists as a tool in research management. The study confirms this effect by comparing the performance based on the ranked journal list with a citation analysis. In the latter, the innovation studies research is not punished by its more interdisciplinary character which does happen in an assessment on the basis of the journal list. The paper concludes with a discussion of the negative implications in terms of funding and acquiring resources for research groups working at the boundaries of different fields

The paper will be published in a forthcoming issue of Research Policy and has been awarded the best paper at the Atlanta Conference on Science and Innovation Policy in September 2011.

Reference: Ismael Rafols, Loet Leydesdorff, Alice O’Hare, Paul Nightingale, & Andy Stirling, “How journal rankings can suppress interdisciplinary research. A comparison between innovation studies and business & management,” Paper presented at the Annual Meeting of the Society for the Social Studies of Science (4S), Cleveland, OH, Nov. 2011; available at
http://arxiv.org/abs/1105.1227
.

%d bloggers like this: