Plea for assessments, against bean counting – part 2

The LERU report “Research universities and research assessment” is partly inspired by problems that university managers have encountered in their attempt to evaluate the performance of their institute. In her presentation at the launch event of the report, Mary Phillips concluded that universities use assessments in a variety of ways. First, they want to know their output, impact and quality for the allocation of funds, performance improvement, and maximization of ‘return on investment’. Second, research assessments are used to inform strategic planning. Third, they are applied to identify excellent researchers in the context of attracting and retaining them. Fourth, they are used to monitor the performance of individual units in the university, such as departments or faculties. Fifth, research assessments are used to identify current and potential partners for scientific collaboration. And last, they are used at the level of the universities to benchmark against their peers.

Given this variety of assessment applications, it is not surprising that universities encounter a number of problems. The report identifies a number of them. The relevant data are diverse and currently often not integrated in tools. There is a lack of agreement on definitons and standards. For example, who counts as a ‘researcher’ may differ in different university systems. Also, the funding mechanisms are still dominantly national and they differ significantly. And finally, many databases that are used in research assessments are proprietary and cannot be controlled by the universities themselves. Morover, Phillips signals that perverse effects can be expected from current assessment procedures. Measurement cultures may “distract from the academic mission”. It is important to be aware of disciplinary differences, for example with respect to the numbers of citations and the relevant time frames of the measurement. Last, the report mentions that academics may feel threatened by research assessments.

In addition to these problems and dangers, the report identifies two important novel developments in research assessment: the European project to rank universities in multiple dimensions (U-Multirank), and the recent emphasis on the societal impact of research in evaluations. The report is rather critical of U-Multirank. The project, in which CWTS participates, aims to address a major problem in current university rankings. Apart from the research focused rankings, such as the Leiden Ranking or the Scimago Ranking, global rankings have combined different dimensions such as the quality of education and research output in an arbitrary way. Also, they apply one model to all universities. However, universities may have very different missions. Therefore, it makes more sense to compare universities with similar missions. “According to the multidimensional approach a focused ranking does not collapse all dimensions into one rank, but will instead provide a fair picture of institutions (‘zooming in’) within the multi-dimensional context provided by the full set of dimensions.” (U-Multirank) In principle, LERU supports this approach and it was also involved in the first stage feasibility study. However, a number of concerns have led LERU to disengage from the project.

“Our main concerns relate to the lack of good or relevant data in several dimensions, the problems of comparability between countries in areas such as funding, the fact that U-Multirank will not attempt to evaluate the data collected, i.e. there will be no “reality-checks”, and last but by no means least, the enormous burden put upon universities in collecting the data, resulting in a lack of involvement from a good mix of different types of universities from all over the world, which renders the resulting analyses and comparisons suspect.” It has led the organization to turn away from rankings as an instrument in assessment. The European Commission has not followed this reasoning and has recently decided to publish a call for the second stage of the U-Multirank project. The consortium has not yet publicly replied to LERU’s critique.

Advertisements

Plea for assessments, against bean counting – part 1

“Above all, universities should stand firm in defending the long-term value of their research activity, which is not easy to assess in a culture where return on investment is measured in very short time spans.” This is the main motif of a new position paper recently published by the League of Research Universities (LERU) about the way universities should handle evaluation of research. In many ways, it is a sensible report which tries to strike a careful balance between the different interests involved. The report is written by Mary Phillips, former director of Research Planning at University College London and currently adviser of Academic Analytics, a for-profit consultancy in the area of research evaluation (and hence one of CWTS’ competitors). The report is a plea for the combined application of peer review and bibliometrics by university management. It also contains a number of principles that LERU would like to see implemented by universities in their assessment procedures.

Point of departure of the report is the observation that assessments have become part and parcel of the university. At the same time, the types of assessments possible and the different methodologies have exploded. This leads to the stimulation of “aobsession with measurement and monitoring, wich may result in a ‘bean counting’ culture detracting from the real quality of research”. Indeed, this has already begun. The dilemmas are made worse by the fact that universities need to deal with large quantities of data, require sophisticated human resource and research management tools, which they often currently lack. On top of all this, funding regimes tend to create incentives which may tempt universities to, as the report with feeling for understatements expresses, “behave in certain ways, sometimes with unfortunate consequences”.

One of the implications is that any assessment system must be sensitive to possible perverse incentives, should take disciplinary differences into account and have a long enough time frame, at least five years according to the report. Assessments should “reflect the reality of research”, including the aspirations of the researchers involved. “Thus, senior administrators and academics must take account of the views of those “at the coal-face” of research”. Assessments should be “as transparent as possible”. Universities are advised to improve their data management systems. And researchers “should be encouraged (or compelled) when publishing, to use a unique personal and institutional designation, and to deposit all publications into the university’s publications  database”.

Universities demand full transparency university rankings

Last week, I attended a two day conference of the rectors of 65 Latin American universities about global university rankings in Mexico City. The meeting concluded by adopting a “Final Declaration” signed by the majority of attending universities. At times, it was a debate in which the emotions ran high. Clearly, many universities leaders had the feeling that they were badly served by most global university rankings. In this, they were supported by the keynote speaker, Simon Marginson, a higher education expert from the University of Melbourne (Australia). He held an excellent speech in which he showed how most rankings are based on a particular model of higher education as a globalized market. In this framework US universities are dominant. Many rectors were of the opinion that the social mission of the Latin American universities will not be valued in this model. Moreover, performance at the international research front is dominant in most rankings, including in our Leiden Ranking. Latin American universities do not score high, if they make it to the ranking at all.

The meeting was organized by Imanol Ordorika, director of institutional evaluation of the National Autonomous University of Mexico. A former leader of the 1987 student demonstrations, he is focused both on international research (in the field of higher education) and on the social role of the universities. The countries in Latin America are confronted with high levels of corruption, enormous economic and social inequalities, and the need for much better mass education. Although these universities are huge (UNAM has more than 300 thousand students), they still cannot accomodate all young people who aspire to study. Approximately one-fifth of Latin America’s youth neither studies nor works. No wonder that university rectors not only worry about their international research effort, but at least as much or more about their role in improving the educational system in their countries.

Against this background, the well-known deficiencies of many global university rankings are even more urgent. This was also the reason to organize the conference. Increasingly, universities that score low or not at all in the rankings – such as the Times Higher Education World University Rankings, de QS World University Rankings, or the Academic Ranking of World Universities (the Shanghai Ranking), or the Ranking Web of World Universities – are questioned about their performance. According to the declaration adopted at the conference, the current rankings have many undesirable effects, such as a homogenizing impact in which the elite US based research university is dominant, a bias in the perception of the performance of Latin American  universities, an undermining of the legitimacy of the national higher education institutions, and the mistaken tendency to see rankings as information systems.

Key problems in the global rankings discussed were: the arbitrary way in which different indicators are combined into one composite indicator; the lack of visibility of the humanities and social sciences; the neglect of the social and cultural impact of the universities; and last but not least the lack of transparency of both methods and data that are used to calculate the indicators. The Leiden Ranking was praised for its transparency and its focus, as was the SCImago Ranking. It was seen as helpful that these rankings make very explicit what they measure and what they do not measure. Of course, these rankings do not enable to compare the universities social mission. For this other measures are needed.

The “Final Declaration” demanded that governments in Latin America avoid using the rankings as elements in evaluating the universities performance. They were also advised to encourage the creation of public databases that permits a well-founded knowledge of the performance of the higher education system. The ranking producers were called upon to adhere to the 2006 “Berlin Principles on Ranking of Higher Education Institutions”. Rankings should be 100% transparent. Ranking producers should also engage in more interaction with the universities. The declaration notes that there is currently no consensus on criteria for measuring the quality of universities. “Any selection of parameters or quantitative indicators to sum up the qualities of universities is rather arbitrary”. The media are admonished to provide a more balanced coverage of the rankings. And the universities in Latin America are encouraged to adopt policies that promote transparency, accountability and open access. Rankings can play a role here. However, universities should not sacrifice “our fundamental responsibilities” in order to implement “superficial strategies designed to improve our standings in the rankings”.

Paul Wouters

On organizational responses to rankings

From 13-15 September 2012, the departments of Sociology and of Anthropology at Goldsmiths are hosting an interdisciplinary conference on ‘practicing comparisons’. Here’s the call for papers. We submitted the following abstract, together with Roland Bal and Iris Wallenburg (Institute of Health Policy and Management, Erasmus University). This cooperation is part of a new line of research on the impacts of evaluation processes on knowledge production.

“Comparing comparisons. On rankings and accounting in hospitals and academia

Not much research has been done as of yet on the ways in which rankings affect academic and hospital performance. The little evidence there is focuses on the university sector. Here, an interest in rankings is driven by a competition in which universities are being made comparable on the basis of ‘impact’. The rise of performance based funding schemes is one of the driving forces. Some studies suggest that shrinking governmental research funding from the 1980s onward has resulted in “academic capitalism” (cf. Slaughter & Lesly 1997). By now, universities have set up special organizational units and have devised specific policy measures in response to ranking systems. Recent studies point to the normalizing and disciplining powers associated with rankings and to ‘reputational risk’ as explanations for organizational change (Espeland & Sauder 2007; Power et al. 2009; Sauder & Espeland 2009). Similar claims have been made for the hospital sector in relation to the effects of benchmarks (Triantafillou 2007). Here, too, we witness a growing emphasis on ‘reputation management’, and on the use of rankings in quality assessment policies.

The modest empirical research done thus far mainly focuses on higher management levels and/or on large institutional infrastructures. Instead, we propose to analyze hospital and university responses to rankings from a whole-organization perspective. Our work zooms in on so-called composite performance indicators that combine many underlying specific indicators (e.g. patient experiences, outcome, and process and structure indicators in the hospital setting, and citation impact, international outlook, and teaching in university rankings). Among other things, we are interested in the kinds of ordering mechanisms (Felt 2009) that rankings bring about on multiple organizational levels – ranging from the managers’ office and the offices of coding staff to the lab benches and hospital beds.

In the paper, we first of all analyze how rankings contribute to making organizations auditable and comparable. Secondly, we focus on how rankings translate, purify, and simplify heterogeneity into an ordered list of comparable units, and on the kinds of realities that are enacted through these rankings. Thirdly, and drawing on recent empirical philosophical and anthropological work (Mol 2002, 2011; Strathern 2000, 2011; Verran 2011), we ask how we as analysts ‘practice comparison’ in our attempt to make hospital and university rankings comparable.”

Journal ranking biased against interdisciplinary research

The widespread use of rankings of journals in research institutes and universities creates a disadvantage for interdisciplinary research in assessment exercises such as the British Research Excellence Framework. This is the conclusion of a paper presented at the 2011 Annual Conference of the Society for the Social Studies of Science in Cleveland (US) by Ismael Rafols (SPRU, Sussex University), Loet Leydesdorff (University of Amsterdam) and Alice O’Hare, Paul Nightingale and Andy Stirling (all SPRU, Sussex University). The study is the first quantitative proof that researchers working at the boundaries between different research fields may be disadvantaged compared with monodisciplinary colleagues. The study argues that citation analysis, if properly applied, is a better measurement instrument than a ranked journal list.

The study is quite relevant for research management at universities and research institutes. Journal lists have become a very popular management tool. In a lot of departments, researchers are obliged to publish in a limited set of journals. Some departments, for example in economics, have even been reorganized on the basis of having published in such a list. The way these lists have been composed does vary. Sometimes a group of experts decides whether a journal belongs to the list, sometimes the Journal Impact Factor published by ISI/Thomson Reuters is the determining factor.

The study by Rafols et al. has analyzed one such list: the ranked journal list used by the British Association of Business Schools. This list is based on a mix of citation statistics and peer review. It ranks scholarly journals in business and management studies in five categories. “Modest standard journals” are category 1, “world elite journals” are category 4*. This scheme reflects the experience researchers have with the Research Assessment Exercise categories. The ranked journal list is meant to be used widely for a variety of management goals. It is used as an advice for researchers about the best venue for their manuscripts. Libraries are supposed to use it in their acquisition policies. And last but not least, it is used in research assessments and personnel evaluations. Although the actual use of the list is an interesting research topic in itself, we can safely assume that it has had a serious impact on the researchers in the British business schools community.

The study shows first of all that the position of a journal in the ranked list correlates negatively with the extent of interdisciplinarity of the journal. In other words, the higher the ranking, the more narrow its disciplinary focus. (The study has used a number of indicators for interdisciplinarity by which different aspects of what it means to be interdisciplinary have been captured.) Rewarding researchers to publish first of all in the ranked journal list may therefore discourage interdisciplinary work.

The study confirms this effect by comparing business and management studies to innovation studies. Both fields are subjected to the same evaluation regime in the Research Excellence Framework. Intellectually, they are very close. However, they differ markedly with respect to their interdisciplinary nature. Researchers in business schools have a more traditional publishing behaviour than their innovation studies colleagues. The research units in innnovation studies are consistently more interdisciplinary than the business and management schools.

Of course, publication behaviour is shaped by a variety of influences. Peer review may be biased against interdisciplinary work because it is more difficult to assess its quality. Many top journals are not eager to publish interdisciplinary work. This study is the first to show convincingly that these already existing biases tend to be made even stronger by the use of ranked journal lists as a tool in research management. The study confirms this effect by comparing the performance based on the ranked journal list with a citation analysis. In the latter, the innovation studies research is not punished by its more interdisciplinary character which does happen in an assessment on the basis of the journal list. The paper concludes with a discussion of the negative implications in terms of funding and acquiring resources for research groups working at the boundaries of different fields

The paper will be published in a forthcoming issue of Research Policy and has been awarded the best paper at the Atlanta Conference on Science and Innovation Policy in September 2011.

Reference: Ismael Rafols, Loet Leydesdorff, Alice O’Hare, Paul Nightingale, & Andy Stirling, “How journal rankings can suppress interdisciplinary research. A comparison between innovation studies and business & management,” Paper presented at the Annual Meeting of the Society for the Social Studies of Science (4S), Cleveland, OH, Nov. 2011; available at http://arxiv.org/abs/1105.1227 .

Harvard no longer number 1 in ranking

Recently, the new Times Higher Education World University Rankings 2011-2012 saw the light. The ranking revealed that Harvard University is no longer number one on the list. Incidentally, the differences with Caltech – now highest – are minimal. The main reason for Caltech’s rise are the extra revenues it drew out of industry. Caltech’s income increased by 16%, thereby outclassing most other universities. Harvard scored a bit better when it comes to the educational environment. Other universities also rose on the list as a result of a successful campaign to obtain (more) external financing. The London School of Economics, for example, moved from 86 to 47. The top of the ranking did not change that drastically though. Rich US-based universities still dominate the list. 7 out of ten universities highest on the list, and one third of the top 200, are located in the US.

This illustrates the THE ranking’s sensitivity to slight differences between indicators that, taken together, shape the order of the ranking. The ranking is based on a mix of many different indicators. There is no standardized way to combine these indicators, and therefore there inevitably is a certain arbitrariness to the process. In addition, the THE ranking is partly based on results of a global survey. This survey invites researchers and professors to assess the reputation of universities. One of the unwanted effects of this method is that well-known universities are more likely to be positively assessed than less popular universities. Highly visible forms of maltreatment and scandals may also influence survey results.

This year, the ranking’s sensitivity to the ways in which different indicators are combined is aptly illustrated by the position of the Dutch universities. The Netherlands are at number 3, with 12 universities in the top 200 and 4 in the first 100 of the world. Given the size of the country, this is a remarkable achievement. The result is partly caused by a strong international orientation of the Dutch universities, and partly by previous investments in research and education. But just as important is the weight given to the performances of the social sciences and humanities in a number of indicators. Compared to last year, the total performance of Dutch universities most likely did not increase that much. A more likely explanation is that the profile of activities and impact are better covered by the THE ranking.

The latest THE ranking does make clear that size is not the most important determinant in positioning universities. Small specialized universities can end up quite high on the list.

Still using the Hirsch index? Don’t!

“My research: > 185 papers, h-index 40.” A random quote from a curriculum vitae in the World Wide Web. Sometimes, researchers love their Hirsch index, better known as the h-index. But what does the measure actually mean? Is it a reliable indicator of scientific impact?

Our colleagues Ludo Waltman and Nees Jan van Eck have studied the mathematical and statistical properties of the h-index. Their conclusion: the h-index can produce inconsistent results. For this reason, it is actually not the reliable measure of scientific impact that most users think it is. As a leading scientometric institute, we have therefore published the advice to all universities, funders, and academies of science to abandon the use of the h-index as a measure of the overall scientific impact of researchers or research groups. There are better alternatives. The paper by Waltman and Van Eck is now available as a preprint and will soon be published by the Journal of the American Society for Information Science and Technology JASIST.

The h-index is a measure of a combination of productivity and citation impact. It is calculated by ordering the number of publications by a particular researcher on the basis of the total number of citations they have received. For example, someone who has an h-index of 40 has published at least 40 articles that have each been cited at least 40 times. Moreover, the remaining articles have not been cited more than 40 times each. The higher the h-index the better.

The h-index was proposed by physicist Jorge Hirsch in 2005. It was an immediate hit. Nowadays, there are about 40 variants of the h-index. About one quarter of all articles published in the main scientometric journals have cited Hirsch’ article in which he describes the h-index. Even more important has been the response by scientific researchers using the h-index. The h-index has many fans, especially in the fields that exchange many citations, such as the biomedical sciences. The h-index is almost irrresistable because it seems to enable a simple comparison of the scientific impact of different researchers. Many institutions have been seduced by the siren call of the h-index. For example, the Royal Netherlands Academy of Arts and Sciences (KNAW) in the Netherlands inquires about the value of the h-index in its recent forms for new members. Individual researchers can look up their h-index based on Google Scholar documents via Harzing’s website publish or perish. Both economists and computer scientists have produced a ranking of their field based on the h-index.

Our colleagues Waltman and Van Eck have now shown that the h-index has some fatal shortcomings. For example, if two researchers with a different h-index co-author a paper together, it may lead to a reversal of their position in an h-index based ranking. The same may happen when we compare research groups. Suppose we have two groups and each member of group A has a higher h-index than a paired researcher in group B. We would now expect that the h-index of group A as group is also higher than that of group B. Well, that does not have to be the case. Please note that we are now speaking of a calculation of the h-index based on a complete and reliable record of documents and citations. The problematic nature of the data if one uses Google Scholar as data source is a different matter. So, even when we have complete and accurate data, the h-index may produce inconsistent results. Surely, this is not what one wants using the index for evaluation purposes!

At CWTS, we have therefore drawn the conclusion that the h-index should not be used as measure of scientific impact in the context of research evaluation.

%d bloggers like this: