Quality in the age of the impact factor

ISIS, the most prestigious journal in the history of science, moved house last September and its central office is now located at the Descartes Centre for the History and Philosophy of the Sciences and Humanities at Utrecht University. The Dutch science historian H. Floris Cohen took up the position of the editor in chief of the journal. No doubt this underlines the international reputation of the community of historians of science in the Netherlands. Being the editor of the central journal in ones field surely is mark of esteem and quality.

The opening of the editorial office in Utrecht was celebrated with a symposium entitled “Quality in the age of the impact factor”. Since quality of research in history is intimately intertwined with the quality of writing, it seemed particularly apt to call attention to the role of impact factors in humanities fields. I used the occasion to pose the question how we actually define scientific and scholarly quality. How do we recognize quality in our daily practices? And how can this variety of practices be understood theoretically? Which approaches in the field of science and technology studies are most relevant?

In the same month, Pleun van Arensbergen graduated on a very interesting PhD dissertation which dealt with some of the issues, “Talent Proof. Selection Processes in Research Funding and Careers”. Van Arensbergen did her thesis work at the Rathenau Institute in The Hague. The quality of research is increasingly seen as mainly the result of the quality of the people involved. Hence, universities “have openly made it one of their main goals to attract scientific talent” (van Arensbergen, 2014, p. 121). A specific characteristics of this “war for talent” in the academic world is that there is an oversupply of talents and a relative lack of career opportunities, leading to a “war between talents”. The dissertation is a thorough analysis of success factors in academic careers. It is an empirical analysis of how the Dutch science foundation NWO selects early career talent in its Innovational Research Incentives Scheme. The study surveyed researchers about their definitions of quality and talent. It combines this with an analysis of both the outcome and the process of this talent selection. Van Arensbergen paid specific attention to the gender distribution and to the difference between successful and unsuccessful applicants.

Her results point to a discrepancy between the common notion among researchers that talent is immediately recognizable (“you know it when you see it”) and the fact that there are very small differences between candidates that get funded and those that do not. The top and the bottom of the distribution of quality among proposals and candidates are relatively easy to detect. But the group of “good” and “very good” proposals is still too large to be funded. Van Arensbergen and her colleagues did not find a “natural threshold” above which the successful talents can be placed. On the contrary, in one of her chapters they find that researchers who leave the academic system due to lack of career possibilities regularly score higher on a number of quality indicators than those who are able to continue a research career. “This study does not confirm that the university system always preserves the highly productive researchers, as leavers were even found to outperform the stayers in the final career phase (van Arensbergen, 2014, p. 125).

Based on the survey, her case studies and her interviews, Van Arensbergen also concludes that productivity and publication records have become rather important for academic careers. “Quality nowadays seems to a large extent to be defined as productivity. Universities seem to have internalized the performance culture and rhetoric to such an extent that academics even define and regulate themselves in terms of dominant performance indicators like numbers of publications, citations or the H-index. (…) Publishing seems to have become the goal of academic labour.” (van Arensbergen, 2014, p. 125). This does not mean, however, that these indicators determine the success of a career. The study questions “the overpowering significance assigned to these performance measures in the debate, as they were not found to be entirely decisive.” (van Arensbergen, 2014, p. 126) An extensive publication record is a condition but not a guarantee for success.

This relates to another finding: the group process of panel discussions are also very important. With a variety of examples, Van Arensbergen shows how the organization of the selection process shapes the outcome. The face to face interview of the candidate with the panel is for example crucial for the final decision. In addition, the influence of the external peer reports was found to be modest.

A third finding in the talent dissertation is that success in obtaining grants feeds back into ones scientific and scholarly career. This creates a self reinforcing mechanism, which the science historian Robert Merton coined the Matthew effect after the quote from the bible: “For unto every one that hath shall be given, and he shall have abundance: but from him that hath not shall be taken even that which he hath.” (Merton, 1968). Van Arensbergen concludes that this means that differences between scholars may initially be small but will increase in the course of time as a result of funding decisions. “Panel decisions convert minor differences in quality into enlarged differences in recognition.”

Combining these three findings leads to some interesting conclusions regarding how we actually define and shape quality in academia. Although panel decisions about who to fund are strongly shaped by the organization of the selection process as well as by a host of other contextual factors (including chance), and although all researchers are aware of the uncertainties in these decisions, this does not mean that these decisions are given less weight. On the contrary, obtaining external grants has become a cornerstone for successful academic careers. Universities even devote considerable resources to make their researchers abler to acquire prestigious grants as well as external funding in general. Although this is clearly instrumental for the organization, Van Arensbergen thinks that grants have become part of the symbolic capital of a researcher and research group and she refers to Pierre Bourdieu’s theory of symbolic capital to better understand the implications.

This brings me to my short lecture at the opening of the editorial office of ISIS in Utrecht. Although the experts on bibliometric indicators don’t generally see the Journal Impact Factor as an indicator of quality, socially it seems to partly function like it. But indicators are not alone in shaping how we in practice identify, and thereby define, talent and quality. They flow together with the way quality assurance and measurement processes are organized, the social psychology of panel discussions, the extent to which researchers are visible in their networks, etc. In these complex contextual interactions, indicators do not determine but they are ascribed meaning dependent on the situation in which the researchers find themselves. A good way to think about this, in my view, is developed in the field of material semiotics. This approach which has its roots in the French actor network theory of Bruno Latour and Michel Callon, does not accept a fundamental rupture in reality between the material and the symbolic. Reality as such is the result of complex and interacting translation processes. This is an excellent philosophical basis to understand how scientific and scholarly quality emerge. I see quality not as an attribute of an academic persona or of a particular piece of work, but as the result of the interaction between a researcher (or a manuscript) and the already existing scientific or scholarly infrastructure (eg. the body of published studies). If this interaction creates a productive friction (meaning that there is enough novelty in the contribution but not so much that it is incompatible with the already existing body of work), we see the work or scholar as of high quality. In other words, quality does simply not (yet) exist outside of the systems of quality measurement. The implication of this is that quality itself is a historical category. It is not an invariant but a culturally and historically specific concept that changes and morphes over time. In fact, the history of science is the history of quality. I hope historians of science will take up the challenge to map this history in more empirical and theoretical sophistication than has been done so far.


Merton, R. K. (1968). The Matthew Effect in Science. Science, 159, 56–62.

Van Arensbergen, P. (2014). Talent proof : selection processes in research funding and careers. The Hague, Netherlands: Rathenau Institute. Retrieved from http://www.worldcat.org/title/talent-proof-selection-processes-in-research-funding-and-careers/oclc/890766139&referer=brief_results



Who is the modern scientist? Lecture by Steven Shapin

There are now many historical studies of what’s been called scientists’ personæ–-the typifications, images, and expectations attached to people who do scientific work. There has been much less interest in the largely managerial and bureaucratic exercises of counting scientists-– finding out how many there are, of what sorts, working in what institutions. This talk first describes how and why scientists came to be counted from about the middle of the twentieth century and then relates those statistical exercises to changing senses of who the scientist was, what scientific inquiry was, and what it was good for.

Here’s more information, including how to register

Date: Thursday 28 November 2013

Time: 5-7 pm

Place: Felix Meritis (Teekenzaal), Keizersgracht 324, Amsterdam

Teaching in Madrid

Started my visiting professorship at the Faculty of Library and Information Science, Complutense University in Madrid today with a nice class discussion about research evaluation. Here is the presentation I gave about the role of information science in research evaluation.

(The) Performance (of) Measurement

Link: http://www.socialsciences.leiden.edu/cwts/

High time to start this blog again! November and December were too busy to keep up with it, as I had to combe getting to know CWTS better with preparing the transfer of the Virtual Knowledge Studio to the e-Humanities Group at the KNAW. I am currently being overwhelmed by positive responses to my inaugural lecture that I gave last Friday in the beautiful Academy Building of Leiden University. In the lecture I sketched my plans for future research at CWTS against the backdrop of the history of performance measurement in the sciences and of the field of scientometrics. The hall was packed and I have received many enhousiastic emails since. It means that we will have a firm ground to build up this research agenda.

So let me summarize the main points. In the past decennia, research evaluation has increased in size and complexity and formal performance indicators are playing a crucial role. This is very different indeed from the times when Ton van Raan started his scientometric research and CWTS in the 1980s. The competition between different indicator research groups and scientometric institutes has also led to a proliferation of indicators. The differences between them are not always clear, as is the exact way in which they are defined, measured and computed. This means that it is now becoming more urgent to include the critique of indicators in the creation of new ones, to spell out the limitations of these indicators to audiences that are not yet accustomed to them. This is also the motivation why CWTS will publish a manual on our indicators later this year.

What does citation actually mean? This is the first research theme that I will explore in the coming years. This question was already tackled by the students of the American historian of Science Robert Merton in the early days of scientometrics, and it is still highly relevant. It is also a bit of a puzzle. At higher level of aggregations, such as large groups of researchers or universities, many studies have shown a correlation between citation frequency and quality of research, reputation of researchers or scientific relevance of the work. However, as soon as we are looking at a more finegrained level at the underlying mechanisms, to understand where this correlation comes from, the correlation seems to disappear. Of course, this may simply mean that it depends on the level of aggregation and also on the exact definition of quality, reputation, and relevance. In itself this is not strange, but it remains unsatisfactory. I will try to dig into this in the coming years, also in relation to the renewed interest in citation theories. A related line of work in this research theme, more important perhaps, is the impact of evaluation and performance indicators on research. How do evaluations actually work out in large research organizations such as universities and hospitals? Are researchers changing their communication and research practices because of the use of citation frequencies in evaluation? Are they citing with this in mind? How will the organization of research be affected? We do not know a lot about these implications of the rise of citation cultures in research, yet it is urgent to understand this better in order to improve the quality of evaluations.

The second research theme I will contribute to has already started at CWTS in the last year. it is fundamental research in the mathematical and statistical properties of performance indicators. Do we actually need all these indicators that we see parading in the pages of Scientometrics? How do they actually relate to each other in terms of their mathematical properties and definitions? And how do they behave when applied to the existing citation databases and research groups? We know that some of these indicators are actually not fit to use in research evaluation, such as the Journal Impact Factor and the Hirsch Index. (Yet these belong to the most popular indicators!) But we currently do not have a systematic overview of the properties of all performance indicators. Consistency and reliability are important issues in this line of work. In this area, I am particularly interested in the connection between the math questions and the sociological questions. Can this combination bring us more robust general design principles for performance indicators? Second, I will contribute by building simulations of the scientific publication and communcation system. I hope this will in the long term build an experimental environment and set of tools to simulate indicators before they are being applied in a management or policy context.

The third research theme, that I think will be very exciting in the coming years is the area of data and knowledge visualization. It is now possible to create sophisticated science maps on the basis of large data sets on scientific research. The recent publication of the Atlas of Science by Katy Börner is a beautiful contribution to this work and has shown the promises. Her book also shows how sensitive these maps are to the underlying assumptions about science and scientific work. Maps have a reality effect and tend to be read three dimensional geographical maps. However, the use of science maps is important precisely because they can present many different dimensions.  This calls for a more systematic study of the design principles of science maps. After we have established these, more user oriented questions are pertinent. Will it be possible to present most scientometric research in the forms of interactive maps of science, where the user can dig into the underlying data sets, and where uncertainties and missing values are clearly indicated?

The fourth research line I will explore in the coming years with my colleagues at CWTS is the question of data sources. It is clear that the current situation is unsatisfactory. Citation databases do not cover all of the scholarly fields, and especially the humanities and social sciences are only partially represented in these databases. For many interesting evaluation as well as research questions, combinations of citation data and other data (investments in research, personnel, patents, cultural impacts) are needed. In small research projects this is often not too difficult, but when we are speaking of large scale research evaluation and management, it does require a quality jump in data infrastructures and data integration. In the end, scientometrics is and remains a data science.

Measuring the world


Recently, I read Daniel Kehlmann’s ficitonal history about Alexander von Humboldt and Carl Friedrich Gauss, Die Vermessung der Welt. intriguing way to write history of science, because it enables the author to insert internal dialogues which are actually quite plausible, yet by definition unproveable. The two characters are quite different and perhaps symbolize the two basic modalities in quantitative research, recognizable also within the field of scientometrics. Alexander von Humboldt is the outgoing guy, travelling the whole world. He is interested in the particulars of objects, collects huge amounts of birds, stones, insects, plants and describes their characteristics meticulously . Gauss, on the other hand, wants to stay home and thinks about the mathematical properties of the universe. He is interested in the fundamentals of mathematical operations and suspects that they can shed light on the structure of reality. In scientometrics, these two different attitudes come together but never without a fight. Building indicators means thinking through both the mathematical properties of indicators, because this directly affects the question of what the indicator is actually supposed to measure. In technical terms, the validity of the indicator. One also needs other types of insight to understand the validity, such as about what researchers are actually doing in their day to day routines, but a firm grip on the mathematical structure of indicators is indispensable. At the same time, the other attitude is also required. Von Humboldt’s interest in statistical description gives insight into the range of phenomena that one can describe with a particular indicator. A good scientometric group, in other words, needs both people like Gauss and people like Von Humboldt. And indeed, both types are present at CWTS. Let us see how the interactions between them will stimulate new fundamental research in scientometrics and indicator building.

The book has also some interesting observations about the obsession of the key actors for measuring the world and the universe. When Alexander von Humboldt travels through South America, he meets a priest Father Zea, who is sceptical about his expedition. He suspects that space is actually created by the people trying to measure space. He mocks Von Humboldt and reminds him of the time "when the things were not yet used to being measured". in that past, three stones were not yet equal to three leaves and fifteen grams of earth were not yet the same weight as fifteen grams of peas. Interesting idea of the things that need to get used to being measured, especially now that we are tagging our natural and social environments increasingly with RFID tags, social networking sites and smart phone applications such as Layar which adds a virtual reality layer of information to your current location. Later in the book, Gauss adds to this by pondering that his work in surveying (which he did for the money) did not only measure the land, but created a new reality by this act of measuring. Before, there had been only trees, moss, stones, and grass. After his work, a network of lines, angles, and numbers had been added to this. Gauss wondered whether Von Humboldt would be able to understand this.

%d bloggers like this: