Tales from the field: On the (not so) secret life of performance indicators

* Guest blog post by Alex Rushforth *

In the coming months Sarah De Rijcke and I have been accepted to present at conferences in Valencia and Rotterdam on research from CWTS’s nascent EPIC working group. We very much look forward to drawing on collaborative work from our ongoing ‘Impact of indicators’ project on biomedical research in University Medical Centers (UMC) in the Netherlands. One of our motivations behind the project is that there has been a wealth of social science literature in recent times about the effects of formal evaluation in public sector organisations, including universities. Yet too few studies have taken seriously the presence of indicators in the context of one of the universities core-missions: knowledge creation. Fewer still have looked to take an ethnographic lens to the dynamics of indicators in the day-to-day work context of academic knowledge. These are deficits we hope to begin addressing through these conferences and beyond.

The puzzle we will be addressing here appears – at least at first glance- straightforward enough: what is the role of bibliometric performance indicators in the biomedical knowledge production process? Yet comparing provisional findings from two contrasting case studies of research groups from the same UMC – one a molecular biology group and the other a statistics group – it becomes quickly apparent that there can be no general answer to this question. As such we aim to provide not only an inventory of different ‘roles’ of indicators in these two cases, but also to pose the more interesting analytical question of what conditions and mechanisms explain the observed variations in the roles indicators come to perform?

Owing to their persistent recurrence in the data so far, the indicators we will analyze are journal impact factor, H-index, and ‘advanced’ citation-based bibliometric indicators. It should be stressed that our focus on these particular indicators have have emerged inductively from observing first-hand the metrics that research groups attended to in their knowledge-making activities. So what have we found so far?

Dutch UMCs constitute particularly apt sites through which to explore this problem given how bibliometric assessments have been central to the formal evaluations carried-out since their inception in the early-2000s. On one level it is argued that researchers in both cases encounter such metrics as ‘governance/managerial devices’, that is, as forms of information required of them by external agencies on whom they are reliant for resources and legitimacy. Such examples can be seen when funding applications, annual performance appraisals, or job descriptions demand such information of an individual’s or group’s past performance. As the findings will show, the information needed by the two groups to produce their work effectively and the types of demands made on them by ‘external’ agencies varies considerably, despite their common location in the same UMC. This is one important reason why the role of indicators differs between cases.

However, this coercive ‘power over’ account is but one dimension of a satisfying answer to our role of indicators question. Emerging analysis reveals also the surprising discovery that in fields characterized by particularly integrated forms of coordination and standardization (Whitley, 2000)– like our molecular biologists – indicators in fact have the propensity to function as a core feature of the knowledge making process. For instance, a performance indicator like the journal impact factor was routinely mobilized informally in researchers’ decision-making as an ad hoc standard against which to evaluate the likely uses of information and resources, and in deciding whether time and resources should be spent pursuing them. By contrast in the less centralized and integrated field statistical research such an indicator was not so indispensable to routines of knowledge making activities. In the case of the statisticians it is possible to speculate that indicators are more likely to emerge intermittently as conditions to be met for gaining social and cultural acceptance by external agencies, but are less likely to inform day-to-day decisions. Through our ongoing analysis we aim to unpack further how disciplinary practices interact with organisation of Dutch UMCs to produce quite varying engagements with indicators.

The extent to which indicators play central/peripheral roles in research production processes across academic contexts is an important sociological problem to be posed in order to enhance understanding of the complex role of performance indicators in academic life. We feel much of the existing literature on evaluation of public organisations has tended to paint an exaggerated picture of formal evaluation and research metrics as synonymous with empty ritual and legitimacy (e.g. Dahler-Larsen, 2012). Emerging results here show that – at least in the realm of knowledge production- the picture is more subtle. This theoretical insight will prompt us to suggest further empirical studies are needed of scholarly fields with different patterns of work organisation in order to compare our results and develop middle-range theorizing on the mechanisms through which metrics infiltrate knowledge production processes to fundamental or peripheral degrees. In future this could mean venturing into fields far outside of biomedicine, such as history, literature, or sociology. For now though we look forward to expanding the biomedical project, by conducting analogous case studies from a second UMC.

Indeed it is through such theoretical developments that we can consider not only the appropriateness of one-size-fits-all models of performance evaluation, but also unpack and problematize discourses about what constitutes ‘misuse’ of metrics. And indeed how convinced should we be that academic life is now saturated and dominated by deleterious metric indicators? 


How does science go wrong?

We are happy to announce that our abstract got accepted for the 2014 Conference of the European Consortium for Political Research (ECPR), which will be held in Glasgow from 3-6 September. Our paper is selected for a panel on ‘The role of ideas and indicators in science policies and research management’, organised by Luis Sanz-Menéndez and Laura Cruz-Castro (both at CSIC-IPP).

Title of our paper: How does science go wrong?

“Science is in need of fundamental reform.” In 2013, five Dutch researchers took the lead in what they hope will become a strong movement for change in the governance of science and scholarship: Science in Transition. SiT appears to voice concerns heard beyond national borders about the need for change in the governance of science (cf. The Economist 19 October 2013; THE 23 Jan. 2014; Nature 16 Oct. 2013; Die Zeit 5 Jan. 2014). One of the most hotly debated concerns is quality control, and it encompasses the implications of a perceived increasing publication pressure, purported flaws in the peer review system, impact factor manipulation, irreproducibility of results, and the need for new forms of data quality management.

One could argue that SiT landed in fertile ground. In recent years, a number of severe fraud cases drew attention to possible ‘perverse effects’ in the management system of science and scholarship. Partly due to the juicy aspects of most cases of misconduct, these debates tend to focus on ‘bad apples’ and shy away from more fundamental problems in the governance of science and scholarship.

Our paper articulates how key actors construct the notion of ‘quality’ in these debates, and how they respond to each other’s position. By making these constructions explicit, we shift focus back to the self-reinforcing ‘performance loops’ that most researchers are caught up in at present. Our methodology is a combination of the mapping of the dynamics of media waves (Vasterman, 2005) and discourse analysis (Gilbert & Mulkay, 1984).


On exploding ‘evaluation machines’ and the construction of alt-metrics

The emergence of web-based ways to create and communicate new knowledge is affecting long-established scientific and scholarly research practices (cf. Borgman 2007; Wouters, Beaulieu, Scharnhorst, & Wyatt 2013). This move to the web is spawning a need for tools to track and measure a wide range of online communication forms and outputs. By now, there is a large differentiation in the kinds of social web tools (i.e. Mendeley, F1000,  Impact Story) and in the outputs they track (i.e. code, datasets, nanopublications, blogs). The expectations surrounding the explosion of tools and big ‘alt-metric’ data (Priem et al. 2010; Wouters & Costas 2012) marshal resources at various scales and gather highly diverse groups in pursuing new projects (cf. Brown & Michael 2003; Borup et al. 2006 in Beaulieu, de Rijcke & Van Heur 2013).

Today we submitted an abstract for a contribution to Big Data? Qualitative approaches to digital research (edited by Martin Hand & Sam Hillyard and contracted with Emerald). In the abstract we propose to zoom in on a specific set of expectations around altmetrics: Their alleged usefulness for research evaluation. Of particular interest to this volume is how altmetrics information is expected to enable a more comprehensive assessment of 1. social scientific outputs (under-represented in citation databases) and 2. wider types of output associated with societal relevance (not covered in citation analysis and allegedly more prevalent in the social sciences).

Our chapter we address a number of these expectations by analyzing 1) the discourse in the “altmetrics movement”, the expectations and promises formulated by key actors involved in “big data” (including commercial entities); and 2) the construction of these altmetric data and their alleged validity for research evaluation purposes. We will combine discourse analysis with bibliometric, webometric and altmetric methods in which both methods will also interrogate each others’ assumptions (Hicks & Potter 1991).

Our contribution will show, first of all, that altmetric data do not simply ‘represent’ other types of outputs; they also actively create a need for these types of information. These needs will have to be aligned with existing accountability regimes. Secondly, we will argue that researchers will develop forms of regulation that will partly be shaped by these new types of altmetric information. They are not passive recipients of research evaluation but play an active role in assessment contexts (cf. Aksnes & Rip 2009; Van Noorden 2010). Thirdly, we will show that the emergence of altmetric data for evaluation is another instance (following the creation of the citation indexes and the use of web data in assessments) of transposing traces of communication into a framework of evaluation and assessment (Dahler-Larsen 2012, 2013; Wouters 2014).

By making explicit what the implications are of the transfer of altmetric data from the framework of the communication of science to the framework of research evaluation, we aim to contribute to a better understanding of the complex dynamics in which new generation of researchers will have to work and be creative.

Who is the modern scientist? Lecture by Steven Shapin

There are now many historical studies of what’s been called scientists’ personæ–-the typifications, images, and expectations attached to people who do scientific work. There has been much less interest in the largely managerial and bureaucratic exercises of counting scientists-– finding out how many there are, of what sorts, working in what institutions. This talk first describes how and why scientists came to be counted from about the middle of the twentieth century and then relates those statistical exercises to changing senses of who the scientist was, what scientific inquiry was, and what it was good for.

Date: Thursday 28 November 2013

Time: 5-7 pm

Place: Felix Meritis (Teekenzaal), Keizersgracht 324, Amsterdam

Update Crafting Your Career (CYC)

Screen Shot 2013-09-16 at 11.44.38 AMCrafting your Career (the event co-organised by CWTS and the Rathenau Instituut, 30 October 2013) is attracting a lot of attention. With only two weeks to go, 173 people have registered (we’re aiming for 200) and over a 1000 people have taken our researcher motivation test.  CYC will facilitate a balanced discussion about the pros and cons of recent trends in research evaluation and their effects on scientific research and scientific careers. While we are busy putting together a program leaflet, our moderators are contacting speakers about the details of the interviews and panel debate. Our Rathenau colleagues are working out the details of the ‘fair’ that takes place during the extended break, and Laurens Hessels and I will have a short meeting tomorrow with KNAW-president prof. Hans Clevers to discuss his opening address.

One of our speakers, Dr. Ruth Müller, was interviewed by ScienceGuide on the occasion of our event. Here’s what she has to say about how post-docs structure academic careers in the life sciences, and the pressures they are experiencing.

Should science studies pay more attention to scientific fraud?

Last week, the Dutch scientific community was rocked by the publication of the final report on the large-scale fraud committed by former professor in social psychology, Diederik Stapel. Three committees performed an extraordinarily thorough examination of the full scientific publication record produced by Stapel and his 70 co-authors. Stapel was known in the Dutch media as the “golden boy” of social psychology. The scientific establishment was also blinded by his apparent success in producing massive amounts of supposedly ingenious experiments. He was appointed as fellow of the Royal Netherlands Academy of Arts and Sciences (KNAW) early in his career and collected large amounts of subsidy from the Dutch science foundation NWO.

In at least 55 publications the data have been fully or partially fabricated. This was done in a cunning way, since at least 1996. Stapel has cooperated with the investigation, but the report mentions that he “did not recognize himself” in the image that the report sketches of a manipulating and at times intimidating schemer. As if to emphasize his role as poseur, Stapel published a book about his fraud the day after the formal report was made public. He even started a tour of signing sessions in the most prestigious academic bookshops in the Netherlands last weekend. Shamelessness has always been a defining characteristic of con men. An investigation by the Dutch prosecutor is still ongoing to see whether Stapel can be brought to justice for fraudulent behavior or financial misdemeanors. So it remains to be seen how long he can go where he pleases.

Perhaps more important than the fraud itself (the report concludes that Stapel did not have much impact on his field), is the conclusion that there is something fundamentally wrong with the research culture in social psychology. On top of the “usual publication bias” (journals prefer positive results over negative results, even when the latter are actually more important), the committees found a strong verification bias. Researchers did everything they could to confirm their hypothesis, including redacting the data, misrepresenting the experiments, copying data from one experiment to another, etc. The report also notes a glaring lack of statistical knowledge among co-authors of quantitative research publications. Since the discovery of the Stapel fraud, social psychologists have taken a number of initiatives to remedy the situation, including strict data and data-sharing protocols, and initiatives to promote replication of experiments and secondary data analysis.

The question is whether this is enough. Social psychology is not the only field confronted with large-scale fraud. For example, the damage of fraudulent or low quality research in the medical sciences may actually be more important. The Erasmus University Rotterdam is now confronted with the gigantic task of checking more than 600 publications written by a suspect cardiac researcher who denies the accusations. Apparently, the system of peer review does not only fail to discover fraud in social psychology, there is a potentially far bigger problem in the medical and clinical sciences. Anti-fraud measures that will be taken in the next few years in these fields will have a strong influence on the research agendas. It seems therefore natural to expect that science studies experts, specialized in analyzing the politics, culture, and economics of scientific and scholarly research, should be able to give a serious contribution.

Yet, this has not yet happened. The key players in the Stapel discovery are the whistle-blowers (3 PhD students), ex-presidents of the KNAW, social psychologists and statistical experts. Science studies experts have not been involved. This is not new. Journalists often are more active in discovering fraud than science studies scholars. I do not think this is coincidental. I see a more fundamental and a more practical explanation. The practical one is that science studies researchers often do not have the data to play a role in detecting and analyzing fraud. Most steps in the quality control processes in science, based on peer review, are confidential. For example, I once tried to get access to an archive of a scientific journal to study the history of that journal, a rather innocent request, and even that was denied. Also, quantitative science studies such as citation analysis cannot detect fraud because effective fraudulent papers are cited in the same ways as sound scientific articles. Bibliometrics does not measure quality directly, but basically measures how the scientific community responds to new papers. If a community fails collectively, bibliometrics fails as well.

The more fundamental reason is that constructivism in science studies has developed a strong neutral attitude (“symmetry”) with respect to the prevailing epistemic cultures. Science studies mostly abstains from a normative perspective and instead tries to analyze how research “really happens”. Since Trevor Pinch’ article on para-psychology in 1979, science studies has questioned the way science and non-science is demarcated by the scientific establishment. Recently, renewed attention has been paid to the ways science is appropriated and steered by powerful political and commercial interests, such as the manipulation of medical research by the pharmaceutical industry. This new emphasis on a more normative research program in science studies may now need to be further stimulated.

In other words, it may make sense for science studies scholars to question their current priorities in the wake of the link between fraud and epistemic cultures. Let me suggest some components of a research agenda. First of all, what kind of phenomenon is scientific fraud actually? When does fraud manifest itself, how is it defined, and by whom? These questions fit comfortably with the dominant constructivist paradigm. Answering them would be an important contribution because there are many grey areas between the formal scientific ideology (such as represented by first year text books) and the actual research practice in a particular lab or institute. Second, we may need to become more normative. How can we detect fraud? What circumstances enable fraud? What kind of configurations of power, accountability and incentives may hinder fraud? I think there is considerable scope for case studies, histories and quantitative research to help tackle these questions.

Quantitative science studies may also contribute. An obvious question is to what extent retracted publications still circulate in the scholarly archive. A more difficult one is whether the combination of citation analysis and full-text analysis may help detect patterns that may identify potential fraud cases. Given the role of the number of citations in performance indicators such as the Journal Impact Factor and the Hirsch Index, we may also want to be more active in detecting “citation clubs” where researchers set up cartels to boost each others citation record. I do not think that purely algorithmic approaches will be able to establish cases of fraud, but it may help as an information filter to be able to zoom in on suspect cases.

Last, but not least, it is high time to take a hard look at the evaluation culture in science, the recurring theme in this blog. The Stapel affair shows how the review committees in psychology have basically failed to detect fundamental weaknesses in the research culture of social psychology. The report asks whether this may be due to the publication pressure, an excuse that co-authors of Stapel frequently invoked to be sloppy with the quality standards for an article. We know from many areas in science that the pressure to publish as fast as possible is felt acutely by many researchers. I do not think that publication pressure as such is sufficient explanation for fraud (it is not the case that most researchers are fraudulent). But there is certainly a problem with the way researchers are being held accountable. Formal criteria (how often did you publish in high prestige journals?) are dominant, at the cost of more substantive criteria (what contribution did you make to knowledge?). Metrics is often used out of context. This evaluation culture should end. We need to go back to meaningful metrics in which the quality and content of ones contribution to knowledge becomes primary again. As Dick Pels formulated it, it is high time to “unhasten science”. At CWTS, we wish to contribute to this goal with our new research program as well as with our bibliometric services.


Book release

Today we are witnessing dramatic changes in the way scientific and scholarly knowledge is created, codified, and communicated. This transformation is connected to the use of digital technologies and the virtualization of knowledge. In this book, scholars from a range of disciplines consider just what, if anything, is new when knowledge is produced in new ways. Does knowledge itself change when the tools of knowledge acquisition, representation, and distribution become digital? Issues of knowledge creation and dissemination go beyond the development and use of new computational tools. The book, which draws on work from the Virtual Knowledge Studio, brings together research on scientific practice, infrastructure, and technology. Focusing on issues of digital scholarship in the humanities and social sciences, the contributors discuss who can be considered legitimate knowledge creators, the value of “invisible” labor, the role of data visualization in policy making, the visualization of uncertainty, the conceptualization of openness in scholarly communication, data floods in the social sciences, and how expectations about future research shape research practices. The contributors combine an appreciation of the transformative power of the virtual with a commitment to the empirical study of practice and use.

Edited by Paul Wouters, Anne Beaulieu, Andrea Scharnhorst and Sally Wyatt.

Why do neoliberal universities play the numbers game?

Performance measurement has brought on a crisis in academia. At least, that’s what Roger Burrows (Goldsmiths, University of London) claims in a recent article for The Sociological Review. According to Burrows, academics are at great risk of becoming overwhelmed by a ‘deep, affective, somatic crisis’. This crisis is brought on by the ‘cultural flattening of market economic imperatives’ that fires up increasingly convoluted systems of measure. Burrows places this emergence of quantified control in academia within the broader context of neoliberalism. Though this has been argued before, Burrows gives the discussion a theoretical twist. He does so by drawing on Gane’s (2012) analysis of Foucault’s (1978-1979) lectures on the relation between market and state under neoliberalism. According to Foucault, neoliberal states can only guarantee the freedom of markets when they apply the same ‘market logic’ on themselves. In this view, the standard depiction of neoliberalism as passive statecraft is not correct. This type of management is not ‘laissez-faire’, but actively stimulates competition and privatization strategies.

In the UK, Burrows contends, the simulation of neoliberal markets in academia has largely been channelled through the introduction of audit and of performance measures. He argues that these control mechanisms become autonomous entities that are increasingly used outside the original context of evaluations, and get a much more active role in shaping the everyday work of academics. According to Burrows, neoliberal universities provide fertile ground for a “co-construction of statistical metrics and social practices within the academy.” Among other things, this leads to a reification of individual performance measures such as the H-index. Burrows:

“[I]t is not the conceptualization, reliability, validity or any other set of methodological concerns that really matter. The index has become reified; (…) a number that has become a rhetorical device with which the neoliberal academy has come to enact ‘academic value’.” (p. 361)

Interestingly, Burrow’s line of reasoning can in some respects itself be seen as a resultant of a broader neoliberal context. Neoliberal policies applaud personal autonomy and the individual’s responsibility for one’s own well-being and professional success. Burrows directly addresses fellow-academics (‘we need to obtain critical distance’; ‘we need to understand ourselves as academics’; ‘why do we feel the way we do?’) and concludes that we are all implicated in the ‘autonomization of metric assemblages’ in the academy. Arguably, it is exactly this neoliberal political climate that justifies Burrows’ focus on individual academics’ affective states. With it comes a delegation of responsibility to the level of the individual researchers. It is our own choice if we comply with the metricization of academia. It is our own choice if we decide to work long hours, spend our weekends writing grant proposals and articles and grading students’ exams. According to Gill (2010), academics tend to justify working so hard because they possess a passionate drive for self-expression and pleasure in intellectual work. Paradoxically, Gill argues, it is this drive that feeds a whole range of disciplinary mechanisms and that lets academics internalize a neoliberal subjectivity. We play ‘the numbers game’, as Burrows calls it, because of “a deep love for the ‘myth’ of what we thought being an intellectual would be like.” (p. 15)

Though Burrows raises concerns that are shared by many academics, it is unfortunate that he does not substantiate his claims with empirical data. Apart from own experience and anecdotal evidence, how do we know that today’s researchers experience the metricization of academia as a ‘deep, affective somatic crisis’? Does it apply to all researchers, is it the same everywhere, and does it hold for all disciplines? These are empirical questions that Burrows does not answer. That said, there is a great need for the types of analyses Burrows and Gill provide, analyses that assess, situate and historicize academic audit cultures. It is not a coincidence that Burrows’ polemic piece emerges from the field of sociology. The social sciences and humanities are increasingly confronted with what Burrows calls the ‘rethoric of accountability’. It has become a commonplace to argue that they, too, should be held accountable for the taxpayers’ money that is being spent on them. These disciplines, too, should be made auditable by way of standardized, transparent performance measures. I agree with Burrows that this rethoric should be problematized. In large parts of these fields it is not at all clear how performance should be ‘measured’ in the first place, for example because of differences in publication cultures within these fields and as compared to the natural sciences. And it is precisely because the discussion is ongoing that we are allowed a clear view of the performative effects of a very specific and increasingly dominant evaluation culture that is not modelled by and on these disciplines. What are the consequences? And are there more constructive alternatives?

Collaboration and competition in research – Special Issue

Hot off the press: a special issue of Higher Education Policy, co-edited by Peter van den Besselaar (Free University, Amsterdam), Sven Hemlin (University of Gothenborg, Sweden) and our colleague Inge van der Weijden (CWTS, Leiden University). The special issue is an outcome of one of the tracks at the 2010 EASST (European Association for the Study of Science and Technology) conference in Trento, Italy. All papers zoom in on competition and collaboration, two increasingly dominant components of research both within and between organizations, and often demanded simultaneously. What is the relation between the two, and what are their effects on scientific quality and on higher education?

This interview with Van den Besselaar for Inside Higher Ed zooms in on one of the articles in the special issue. To what extent is success in academic careers determined by cultural, social and intellectual capital, and organisational and contextual factors? Van Balen, Van Arensbergen, Van der Weijden and Van den Besselaar performed a literature study, held interviews, and compared the careers of pairs of similar researchers that were considered talented in their early career and either stayed in or left academia. Their findings suggest that there is not one decisive factor that determines which talented researchers continue or discontinue their academic careers. Some factors were found to be important (e.g. social capital), whereas others were not (cultural and intellectual capital). Interestingly, Van Balen et al. did not find a “systematic relationship between the career success and the academic performance of highly talented scholars, measured as the number of publications and citations.” (p. 330-331)

