Quality in the age of the impact factor

ISIS, the most prestigious journal in the history of science, moved house last September and its central office is now located at the Descartes Centre for the History and Philosophy of the Sciences and Humanities at Utrecht University. The Dutch science historian H. Floris Cohen took up the position of the editor in chief of the journal. No doubt this underlines the international reputation of the community of historians of science in the Netherlands. Being the editor of the central journal in ones field surely is mark of esteem and quality.

The opening of the editorial office in Utrecht was celebrated with a symposium entitled “Quality in the age of the impact factor”. Since quality of research in history is intimately intertwined with the quality of writing, it seemed particularly apt to call attention to the role of impact factors in humanities fields. I used the occasion to pose the question how we actually define scientific and scholarly quality. How do we recognize quality in our daily practices? And how can this variety of practices be understood theoretically? Which approaches in the field of science and technology studies are most relevant?

In the same month, Pleun van Arensbergen graduated on a very interesting PhD dissertation which dealt with some of the issues, “Talent Proof. Selection Processes in Research Funding and Careers”. Van Arensbergen did her thesis work at the Rathenau Institute in The Hague. The quality of research is increasingly seen as mainly the result of the quality of the people involved. Hence, universities “have openly made it one of their main goals to attract scientific talent” (van Arensbergen, 2014, p. 121). A specific characteristics of this “war for talent” in the academic world is that there is an oversupply of talents and a relative lack of career opportunities, leading to a “war between talents”. The dissertation is a thorough analysis of success factors in academic careers. It is an empirical analysis of how the Dutch science foundation NWO selects early career talent in its Innovational Research Incentives Scheme. The study surveyed researchers about their definitions of quality and talent. It combines this with an analysis of both the outcome and the process of this talent selection. Van Arensbergen paid specific attention to the gender distribution and to the difference between successful and unsuccessful applicants.

Her results point to a discrepancy between the common notion among researchers that talent is immediately recognizable (“you know it when you see it”) and the fact that there are very small differences between candidates that get funded and those that do not. The top and the bottom of the distribution of quality among proposals and candidates are relatively easy to detect. But the group of “good” and “very good” proposals is still too large to be funded. Van Arensbergen and her colleagues did not find a “natural threshold” above which the successful talents can be placed. On the contrary, in one of her chapters they find that researchers who leave the academic system due to lack of career possibilities regularly score higher on a number of quality indicators than those who are able to continue a research career. “This study does not confirm that the university system always preserves the highly productive researchers, as leavers were even found to outperform the stayers in the final career phase (van Arensbergen, 2014, p. 125).

Based on the survey, her case studies and her interviews, Van Arensbergen also concludes that productivity and publication records have become rather important for academic careers. “Quality nowadays seems to a large extent to be defined as productivity. Universities seem to have internalized the performance culture and rhetoric to such an extent that academics even define and regulate themselves in terms of dominant performance indicators like numbers of publications, citations or the H-index. (…) Publishing seems to have become the goal of academic labour.” (van Arensbergen, 2014, p. 125). This does not mean, however, that these indicators determine the success of a career. The study questions “the overpowering significance assigned to these performance measures in the debate, as they were not found to be entirely decisive.” (van Arensbergen, 2014, p. 126) An extensive publication record is a condition but not a guarantee for success.

This relates to another finding: the group process of panel discussions are also very important. With a variety of examples, Van Arensbergen shows how the organization of the selection process shapes the outcome. The face to face interview of the candidate with the panel is for example crucial for the final decision. In addition, the influence of the external peer reports was found to be modest.

A third finding in the talent dissertation is that success in obtaining grants feeds back into ones scientific and scholarly career. This creates a self reinforcing mechanism, which the science historian Robert Merton coined the Matthew effect after the quote from the bible: “For unto every one that hath shall be given, and he shall have abundance: but from him that hath not shall be taken even that which he hath.” (Merton, 1968). Van Arensbergen concludes that this means that differences between scholars may initially be small but will increase in the course of time as a result of funding decisions. “Panel decisions convert minor differences in quality into enlarged differences in recognition.”

Combining these three findings leads to some interesting conclusions regarding how we actually define and shape quality in academia. Although panel decisions about who to fund are strongly shaped by the organization of the selection process as well as by a host of other contextual factors (including chance), and although all researchers are aware of the uncertainties in these decisions, this does not mean that these decisions are given less weight. On the contrary, obtaining external grants has become a cornerstone for successful academic careers. Universities even devote considerable resources to make their researchers abler to acquire prestigious grants as well as external funding in general. Although this is clearly instrumental for the organization, Van Arensbergen thinks that grants have become part of the symbolic capital of a researcher and research group and she refers to Pierre Bourdieu’s theory of symbolic capital to better understand the implications.

This brings me to my short lecture at the opening of the editorial office of ISIS in Utrecht. Although the experts on bibliometric indicators don’t generally see the Journal Impact Factor as an indicator of quality, socially it seems to partly function like it. But indicators are not alone in shaping how we in practice identify, and thereby define, talent and quality. They flow together with the way quality assurance and measurement processes are organized, the social psychology of panel discussions, the extent to which researchers are visible in their networks, etc. In these complex contextual interactions, indicators do not determine but they are ascribed meaning dependent on the situation in which the researchers find themselves. A good way to think about this, in my view, is developed in the field of material semiotics. This approach which has its roots in the French actor network theory of Bruno Latour and Michel Callon, does not accept a fundamental rupture in reality between the material and the symbolic. Reality as such is the result of complex and interacting translation processes. This is an excellent philosophical basis to understand how scientific and scholarly quality emerge. I see quality not as an attribute of an academic persona or of a particular piece of work, but as the result of the interaction between a researcher (or a manuscript) and the already existing scientific or scholarly infrastructure (eg. the body of published studies). If this interaction creates a productive friction (meaning that there is enough novelty in the contribution but not so much that it is incompatible with the already existing body of work), we see the work or scholar as of high quality. In other words, quality does simply not (yet) exist outside of the systems of quality measurement. The implication of this is that quality itself is a historical category. It is not an invariant but a culturally and historically specific concept that changes and morphes over time. In fact, the history of science is the history of quality. I hope historians of science will take up the challenge to map this history in more empirical and theoretical sophistication than has been done so far.

Literature:

Merton, R. K. (1968). The Matthew Effect in Science. Science, 159, 56–62.

Van Arensbergen, P. (2014). Talent proof : selection processes in research funding and careers. The Hague, Netherlands: Rathenau Institute. Retrieved from http://www.worldcat.org/title/talent-proof-selection-processes-in-research-funding-and-careers/oclc/890766139&referer=brief_results

 

Developing guiding principles and standards in the field of evaluation – lessons learned

This is a guest blog post by professor Peter Dahler-Larsen. The reflections below are a follow-up of his keynote at the STI conference in Leiden (3-5 September 2014) and the special session at STI on the development of quality standards for science & technology indicators. Dahler-Larsen holds a chair at the Department of Political Science, University of Copenhagen. He is former president of the European Evaluation Society and author of The Evaluation Society (Stanford University Press, 2012).

Lessons learned about the development of guiding principles and standards in the field of evaluation – A personal reflection

Professor Peter Dahler-Larsen, 5 October 2014

Guidelines are symbolic, not regulatory

The limited institutional status of guiding principles and standards should be understood as a starting point for the debate. In the initial phases of development of such standards and guidelines, people often have very strong views. But only the state can enforce laws. To the extent that guidelines and standards merely express some official views of a professional association who has no institutional power to enforce them, standards and guidelines will have limited direct consequences for practitioners. The discussion becomes clearer once it is recognized that standards and guidelines thus primarily have a symbolic and communicative function, not a regulatory one. Practitioners will continue to be free to do whatever kind of practice they like, also after guidelines have been adopted.

Design a process of debate and involvement

All members of a professional association should have a possibility to comment on a draft version of guidelines/standards. An important component in the adoption of guidelines/standards is the design of a proper organizational process that involves the composition of a draft by a select group of recognized experts, an open debate among members, and an official procedure for the adoption of standards/guidelines as organizational policy.

Acknowledge the difference between minimum and maximum standards

Minimal standards must be complied with in all situations. Maximum standards are ideal principles worth striving for, although they will not be accomplished in any particular situation. It often turns out that there will be many maximum principles in a set of guidelines, although that is not what most people believe is “standards.” For that reason I personally prefer the term guidelines or guiding principles rather that “standards.”

Think carefully about guidelines and methodological pluralism

Advocates of a particular method often think that methodological rules connected to their own method defines quality as such in the whole field. For that reason, they are likely to insert their own methodological rules into the set of guidelines. As a consequence, guidelines can be used politically to promote one set of methods or one particular paradigm rather than another. Great care should be exercised in the formulation of guidelines to make sure that pluralism remains protected. For example, in evaluation the rule is that if you subscribe to a particular method, you should have high competence in the chosen method. But that goes for all methods.

Get beyond the “but that´s obvious” argument

Some argue that it is futile to formulate a set of guidelines because at that level of generality, it is only possible to state some very broad and obvious principles with which every sensible person must agree. The argument sounds plausible when you hear it, but my experience suggests otherwise for a number of reasons. First, some people have just not thought about a very bad practice (for example, doing evaluation without written Terms of Reference). Once you see, that someone has formulated a guideline against this, you are likely to start paying attention to the problem. Just because a principle is obvious to some, does not mean that it is obvious to all. Second, although there may be general agreement about a principle (such as “do no unnecessary harm” or “take general social welfare into account”), there can be strong disagreement about the interpretations and implications of the principle in practice.  Third, a good set of guiding principles will often comprise at least two principles that are somewhat in tension with each other, for example the principle of being quick and useful versus the principle of being scientifically rigorous. To sort out exactly which kind of tension between these two principles one can live with in a concrete case turns out to be a matter of complicated professional judgment. So, get beyond the “that´s obvious” argument.

Recognize the fruitful uses of guidelines

Among the most important uses of guidelines in evaluation are:

– In application situations, good evaluators can explain their practice with reference to broader principles

– In conferences, guidelines can stimulate insightful professional discussions about how to handle complicated cases

– Books and journals can make use of guidelines as inspiration for the development of an ethical awareness among practitioners. For example, google Michael Morris´ work in the field of evaluation.

– There is great use of guidelines in teaching and in other forms of socialization of evaluators.

Respect the multiplicity of organizations

If, say, the European Evaluation Society wants to adopt a set of guidelines, it should be respected that, say, the German and the Swiss association already have their own guidelines. Furthermore, some professional associations (say, psychologists) also have guidelines. A professional association should take such overlaps seriously and find ways to exchange views and experiences with guidelines across national and organizational borders.

Professionals are not alone, but relations can be described in guidelines, too

It is often debated that one of the major problems in bad evaluation practice is the behavior of commissioners. Some therefore think that guidelines describing good evaluation practice are in vain until the behavior of commissioners (and perhaps other users of evaluation) are included in the guidelines, too. However, there is no particular reason why the guidelines cannot describe a good relation and a good interaction between commissioners and evaluators. Remember, guidelines have no regulatory power. They express merely the official norms of the professional association. Evaluators are allowed to express what they think a good commissioner should do or not do. In fact, explicit guidelines can help clarify mutual and reciprocal role expectations.

Allow for regular reflection, evaluation and revision of guidelines

At regular intervals, guidelines should be debated, evaluated and revised. The AEA guidelines, for example, have been revised and now reflect values regarding culturally competent evaluation that was not in earlier versions. Guidelines are organic and reflect a particular socio-historical situation.

Sources:

Michael Morris (2008). Evaluation Ethics for Best Practice. Guilford Press.

American Evaluation Association Guiding principles

%d bloggers like this: