The new Dutch research evaluation protocol

From 2015 onwards, the societal impact of research will be a more prominent measure of success in the evaluation of research in the Netherlands. Less emphasis will be put on the number of publications, while the vigilance about research integrity will be increased. These are the main elements of the new Dutch Standard Evaluation Protocol which was published a few weeks ago.

The new protocol aims to guarantee, improve, and make visible the quality and relevance of scientific research at Dutch universities and institutes. Three aspects are central: scientific quality; societal relevance; and feasibility of the research strategy of the research groups involved. As is already the case in the current protocol, research assessments are organized by institution, and the institutional board is responsible. Nationwide comparative evaluations by discipline are possible, but the institutions involved have to agree explicitly to organize their assessments in a coordinated way to realize this. In contrast to performance based funding systems, the Dutch system does not have a tight coupling between assessment outcomes and funding for research.

This does not mean, inter alia, that research assessments in the Netherlands do not have consequences. On the contrary, these may be quite severe but they will usually be implemented by the university management with considerable leeway for interpretation of the assessment results. The main channel through which Dutch research assessments has implications is via the reputation gained or lost for the research leaders involved. The effectiveness of the assessments is often decided by the way the international committee works which performs the evaluation. If they see it as their main mission to celebrate their nice Dutch colleagues (as has happened in the recent past), the results will be complimentary but not necessarily very informative. On the other hand, they may also punish groups by using criteria that are actually not valid for those specific groups although they may be standard for the discipline as a whole (and this has also happened, for example when book-oriented groups work in a journal-oriented discipline).

The protocol does not include a uniform set of requirements or indicators. The specific mission of the research institutes or university departments under assessment is leading. As a result, research that is mainly aimed at having practical impact may be evaluated with different criteria from a group that aims to work on the international frontier of basic research. The protocol is not unified around substance but around procedure. Each group has to be evaluated every six years. A new element in the protocol is also that the scale for assessment has been changed from a five-point to a four-point scale, ranging from “unsatisfactory”, via “good” and “very good” to “excellent”. This scale will be applied to all three dimensions: scientific quality, societal relevance, and feasibility.

The considerable freedom that the peer committees have in evaluating Dutch research has been maintained in the new protocol. Therefore, it remains to be seen what the effects will be of the novel elements in the protocol. In assessing the societal relevance of research, the Dutch are following their British peers. Research groups will have to construct “narratives” which explain the impact their research has had on society, understood broadly. It is not yet clear how these narratives will be judged according to the scale. The criteria for feasibility are even less clear: according to the protocol a group has an “excellent” feasibility if it is “excellently equipped for the future”. Well, we’ll see how this works out.

With less emphasis on the amount of publications in the new protocol, the Dutch universities, the funding agency NWO and the academy of science KNAW (who collectively are reponsible for the protocol) have also responded to the increased anxiety about “perverse effects” in the research system triggered by the ‘Science in Transition’ group and to recent cases of scientific fraud. The Dutch minister of education, culture and the sciences Jet Bussemaker welcomed this change. “Productivity and speed should not be leading considerations for researchers”, she said at the reception of the new protocol. I fully agree with this statement, yet this aspect of the protocol will also have to stand the test of practice. In many ways, the number of publications is still a basic building block of scientific or scholarly careers. For example, the h-index is very popular in the medical sciences  ((Tijdink, Rijcke, Vinkers, Smulders, & Wouters, 2014). This index is a combination of the number of publications of a researcher and the citation impact of these articles in such a way that the h-index can never be higher than the total number of publications. This means that if researchers are compared according to the h-index, the most productive ones will prevail. We will have to wait and see whether the new evaluation protocol will be able to withstand this type of reward for high levels of article production.

Reference: Tijdink, J. K., Rijcke, S. De, Vinkers, C. H., Smulders, Y. M., & Wouters, P. (2014). Publicatiedrang en citatiestress. Nederlands Tijdschrift Voor Geneeskunde, 158, A7147.

Advertisements

Selling science to Nature

On Saturday 22 December, the Dutch national newspaper NRC published an interview with Hans Clevers, professor of molecular genetics and president of the Royal Netherlands Academy of Arts and Sciences (KNAW). The interview is the latest in a series of public performances following Clevers’ installment as president in 2012, in which he responds to current concerns about the need for revisions in the governance of science. The recent Science in Transition initiative for instance stirred quite some debate in the Netherlands, also within the Academy. One of the most hotly debated issues is that of quality control, an issue that encompasses the implications of an increasing publication pressure, purported flaws in the peer review system, impact factor manipulation, and the need for new forms of data quality management.

Clevers is currently combining the KNAW-presidency with his group leadership at the Hubrecht Institute in Utrecht. In both roles he actively promotes data sharing. He told the NRC that he stimulates his own researchers to share all findings. “Everything is for the entire lab. Asians in particular sometimes need to be scolded for trying to keep things to themselves.” When it comes to publishing the findings, it is Clevers who decides who contributed most to a particular project and who deserves to be first author. “This can be a big deal for the careers of PhD students and post-docs.” The articles for ‘top journals’ like Nature or Science he always writes himself. “I know what the journals expect. It requires great precision. A title consists of 102 characters. It should be spot-on in terms of content, but it should also be exciting.”

Clevers does acknowledge some of the problems with the current governance of science — the issue of data sharing and mistrust mentioned above, but for instance also the systematic imbalance in the academic reward system when it comes to appreciation for teaching. However, he does not seem very concerned with publication pressure. He argued on numerous occasions that publishing is simply part of daily scientific life. According to him, the number of articles is not a leading criterium. In most fields, it’s the quality of the papers that matters most. With these statements Clevers clearly puts himself in the mainstream view on scientific management. But there are also dissenting opinions, and sometimes they are voiced by other prominent scientists from the same field. Last month, Nobel Prize winner Randy Schekman, professor of molecular and cell biology at UC Berkeley, declared a boycott on three top-tier journals at the Nobel Prize ceremony in Stockholm. Schekman argued that NatureCellScience and other “luxury” journals are damaging the scientific process by artificially restricting the number of papers they accept, by make improper use of the journal impact factor as a marketing tool, and by depending on editors that favor spectacular findings over soundness of the results. 

The Guardian published an article in which Schekman iterated his critique. The journal also made an inventory of the reactions of the editors-in-chief of NatureCell and Science. They washed their hands of the matter. Some even delegated the problems to the scientists themselves. Philip Campbell, editor-in-chief of Nature, referred to a recent survey of the Nature Publishing Group which revealed that “[t]he research community tends towards an over-reliance in assessing research by the journal in which it appears, or the impact factor of that journal.”

In a previous blog post we paid attention to a call for an in-depth study of the editorial policies of NatureScience, and Cell by Jos Engelen, president of the Netherlands Organization for Scientific Research (NWO). It is worth reiterating some parts of his argument. According to Engelen the reputation of these journals, published by commercial publishers, is based on ‘selling’ innovative science derived from publicly funded research. Their “extremely selective publishing policy” has turned these journals into ‘brands’ that have ‘selling’ as their primary interest, and not, for example, “promoting the best researchers.” Here we see the contours of a disagreement with Clevers. Without wanting to read too much into his statements, Clevers on more than one occasion treats the status and quality of NatureCell and Science as apparently self-evident — as the main current of thought would have it. But in the NRC interview Clevers also does something else: By explaining his policy to write the ‘top-papers’ himself he also reveals that these papers are as much the result of craft, reputation and access, as they are an ‘essential’ quality of the science behind it. Knowing how to write attractive titles is a start – but it is certainly not the only skill needed in this scientific reputation game.

The stakes are high with regard to scientific publishing  — that much is clear. Articles in ‘top’ journals can make, break or sustain careers. One possible explanation for the status of these journals is of course that researchers have become highly reliant on on external funding for the continuation of their research. And highly cited papers in high impact journals have become the main ‘currency’ in science, as theoretical physicist Jan Zaanen called it in a lecture at our institute. The fact that articles in top journals serve as de facto proxies for the quality of researchers is perhaps not problematic in itself (or is it?). But it certainly becomes tricky if these same journals increasingly treat short-term news-worthiness as an important criterion in their publishing policies, and if peer review committee work also increasingly revolves around selecting those projects that are most likely to have short-term success. Amongst others Frank Miedema (one of the initiators of Science in Transition) argues that this is the case in his booklet Science 3.0. Clearly, there is a need for thorough research into these dynamics. How prevalent are they? And what are the potential consequences for longer-term research agendas?

Stick to Your Ribs: Interview with Paula Stephan — Economics, Science, and Doing Better

Diversity in publication cultures II

As said in the previous post on the topic of diversity in publication cultures, the recent DJA publication, “Kennis over publiceren. Publicatietradities in de wetenschap”, presents interesting and valuable personal experiences. At the same time, the booklet tends to cut corners and make rather crude statements about the role of evaluation and indicators. Often, the individual life stories are not properly contextualized. For example, physicist Tjerk Oosterkamp claims that citation analysis is “not at all” appropriate for experimental physics. According to him, the use of citation scores in evaluation would encourage researchers to stick to “simple things” and shy away from more daring and risky projects. But is this true? Many initially risky projects attracted quite a lot of citations later. As far as I know, we do not yet have a lot of evidence about the effect of evaluations and performance indicators on risk behavior in science. We do indeed have some indications that researchers tend to avoid risky projects, especially in writing applications for externally funded projects. Yet, we do not know whether this means that researchers are taking less risks across the board.

Another objection is that citation patterns may reflect current fashions rather than the most valuable research. I think this is an important point. For example, the recent hype about graphene research in physics may prove to be less valuable than expected. Citations represent impact on the short term communication within the relevant research communities. This is different from long term impact on the body of knowledge. There is a relationship between the two types of impact, but they are certainly not identical.

A second example of cutting corners is the statement by the editors in one of the essays of the DJA publication that “there is not much support among scientists for bibliometric analysis (p. 25). Well, to be honest, this varies quite strongly. In many areas in the natural and biomedical sciences quantitative performance analysis is actually quite hot. Also, we see a tendency in the humanities and social sciences to try to find a cure for the lack of publication data in Google Scholar, which often, albeit not always, has a much better coverage of these areas. They are sometimes even willing to turn a blind eye to the quite considerable problems with the accuracy and reliability of these data. So, the picture is much more complicated than the image of bibliometrics being performed top-down on the unhappy researcher.

Notwithstanding these shortcomings, the DJA booklet presents important dilemmas and problems. Perhaps the legal scholar Carla Sieburgh presents the problem most clearly: quality can in the end only be judged by experts. However, there is no time to have external reviewers read all the material. Hence the shift towards measurement. But this tends to lead us away from the content. In every discipline, some solution of this dilemma needs to be found, probably by striking a discipline-specific balance between objectified analysis from outside and internalized quality control by experts. This search for the optimal balance is especially important in those fields where quality control has been introduced relatively recently.

Diversity in publication cultures

Last December, the “Young Academy” (DJA), a part of the Royal Netherlands Academy of Arts and Sciences, published an interesting Dutch-language booklet about the experiences of their members (somewhat younger professors and assistant professors) in publishing and evaluations, “Kennis over publiceren. Publicatietradities in de wetenschap”. It tries to chart the enormous diversity in publication and citation traditions across, and even within, disciplines. The booklet aims to contribute to the increasingly important discussion about the best way to communicate scientific and scholarly results and about the current evaluation protocols. It combines a general overview of the debate and part of the literature on publishing and citation with text boxes in which DJA members are interviewed about their own careers. The latter part is the most interesting of the booklet.

The DJA publication confirms the main themes of the new CWTS research programma which we also published last December. First, we are witnessing an increasing formalized evaluation culture in science and scholarship, which is now also covering the humanities and social sciences. Second, there is not one proper way to publish and evaluate ones work. What works well in the geosciences, may be very inappropriate for chemistry. Books are still very important in some fields, and virtually non-existent in others. Third, these differences do not map neatly on the boundaries between the natural sciences, social sciences and humanities. There are important differences within these areas and even within disciplines.

This creates a challenge for research evaluations, and this theme is the main thread in our new research programme “Merit, Expertise and Measurement” . On the one hand, in order to be fair evaluation criteria need to be standardized and generic. On the other hand, the characteristics of different fields need to be taken into account. The current evaluation system in the Netherlands, as well as the way project proposals are evaluated by the Dutch Science Foundation NWO, tends to be biased towards the standardized criteria based on an implicit natural science model. As a result, publications in international journals with high impact factors have become the gold standard. This disadvantages other forms of scholarly communication, tends to devalue translational research which is aimed at societal impact rather than scientific excellence, and makes life more difficult for researchers who are publishing either books or in non-English languages.

These differences in publication cultures are often underestimated. For example, most people involved in evaluation tend to think that peer review is universally accepted. But according to Janneke Gerards, a legal scholar in human rights, it is common practice in her field that journal editorial boards decide themselves about submitted manuscripts. She does not see much added value in external peer review. In other fields, however, it would be unacceptable not to have external peer review, often even double blind reviews are required. A comparable variation can be found regarding the value of citation analysis for research assessment. For paleo-climatologist Appy Sluijs citation analysis is a reasonably good method. But according to experimental physicist Tjerk Oosterkamp, which is also a field oriented to international journals, citation analysis is “not at all a good instrument”. His guess is that this would promote mainstream research at the cost of truely innovative work. The evidence for this expectation is lacking, however.

This is true of the booklet as a whole. The authors as well as the interviewees mostly extrapolate their own experiences. The DJA publication therefore does not offer a good overall theoretical or empirical framework, nor does it offer much new for the field of bibliometrics. Still, I think it has been a very valuable exercise because it foregrounds the life experiences of some members of a group from the new generation of excellent scholars (measured according to their success in building a career in ground-breaking research). Building on this type of experiences in a more systematic way is the main goal of our new research program. It would be great to collaborate more in order to improve our understanding of how evaluation in research really works.

%d bloggers like this: