Developing guiding principles and standards in the field of evaluation – lessons learned

This is a guest blog post by professor Peter Dahler-Larsen. The reflections below are a follow-up of his keynote at the STI conference in Leiden (3-5 September 2014) and the special session at STI on the development of quality standards for science & technology indicators. Dahler-Larsen holds a chair at the Department of Political Science, University of Copenhagen. He is former president of the European Evaluation Society and author of The Evaluation Society (Stanford University Press, 2012).

Lessons learned about the development of guiding principles and standards in the field of evaluation – A personal reflection

Professor Peter Dahler-Larsen, 5 October 2014

Guidelines are symbolic, not regulatory

The limited institutional status of guiding principles and standards should be understood as a starting point for the debate. In the initial phases of development of such standards and guidelines, people often have very strong views. But only the state can enforce laws. To the extent that guidelines and standards merely express some official views of a professional association who has no institutional power to enforce them, standards and guidelines will have limited direct consequences for practitioners. The discussion becomes clearer once it is recognized that standards and guidelines thus primarily have a symbolic and communicative function, not a regulatory one. Practitioners will continue to be free to do whatever kind of practice they like, also after guidelines have been adopted.

Design a process of debate and involvement

All members of a professional association should have a possibility to comment on a draft version of guidelines/standards. An important component in the adoption of guidelines/standards is the design of a proper organizational process that involves the composition of a draft by a select group of recognized experts, an open debate among members, and an official procedure for the adoption of standards/guidelines as organizational policy.

Acknowledge the difference between minimum and maximum standards

Minimal standards must be complied with in all situations. Maximum standards are ideal principles worth striving for, although they will not be accomplished in any particular situation. It often turns out that there will be many maximum principles in a set of guidelines, although that is not what most people believe is “standards.” For that reason I personally prefer the term guidelines or guiding principles rather that “standards.”

Think carefully about guidelines and methodological pluralism

Advocates of a particular method often think that methodological rules connected to their own method defines quality as such in the whole field. For that reason, they are likely to insert their own methodological rules into the set of guidelines. As a consequence, guidelines can be used politically to promote one set of methods or one particular paradigm rather than another. Great care should be exercised in the formulation of guidelines to make sure that pluralism remains protected. For example, in evaluation the rule is that if you subscribe to a particular method, you should have high competence in the chosen method. But that goes for all methods.

Get beyond the “but that´s obvious” argument

Some argue that it is futile to formulate a set of guidelines because at that level of generality, it is only possible to state some very broad and obvious principles with which every sensible person must agree. The argument sounds plausible when you hear it, but my experience suggests otherwise for a number of reasons. First, some people have just not thought about a very bad practice (for example, doing evaluation without written Terms of Reference). Once you see, that someone has formulated a guideline against this, you are likely to start paying attention to the problem. Just because a principle is obvious to some, does not mean that it is obvious to all. Second, although there may be general agreement about a principle (such as “do no unnecessary harm” or “take general social welfare into account”), there can be strong disagreement about the interpretations and implications of the principle in practice.  Third, a good set of guiding principles will often comprise at least two principles that are somewhat in tension with each other, for example the principle of being quick and useful versus the principle of being scientifically rigorous. To sort out exactly which kind of tension between these two principles one can live with in a concrete case turns out to be a matter of complicated professional judgment. So, get beyond the “that´s obvious” argument.

Recognize the fruitful uses of guidelines

Among the most important uses of guidelines in evaluation are:

– In application situations, good evaluators can explain their practice with reference to broader principles

– In conferences, guidelines can stimulate insightful professional discussions about how to handle complicated cases

– Books and journals can make use of guidelines as inspiration for the development of an ethical awareness among practitioners. For example, google Michael Morris´ work in the field of evaluation.

– There is great use of guidelines in teaching and in other forms of socialization of evaluators.

Respect the multiplicity of organizations

If, say, the European Evaluation Society wants to adopt a set of guidelines, it should be respected that, say, the German and the Swiss association already have their own guidelines. Furthermore, some professional associations (say, psychologists) also have guidelines. A professional association should take such overlaps seriously and find ways to exchange views and experiences with guidelines across national and organizational borders.

Professionals are not alone, but relations can be described in guidelines, too

It is often debated that one of the major problems in bad evaluation practice is the behavior of commissioners. Some therefore think that guidelines describing good evaluation practice are in vain until the behavior of commissioners (and perhaps other users of evaluation) are included in the guidelines, too. However, there is no particular reason why the guidelines cannot describe a good relation and a good interaction between commissioners and evaluators. Remember, guidelines have no regulatory power. They express merely the official norms of the professional association. Evaluators are allowed to express what they think a good commissioner should do or not do. In fact, explicit guidelines can help clarify mutual and reciprocal role expectations.

Allow for regular reflection, evaluation and revision of guidelines

At regular intervals, guidelines should be debated, evaluated and revised. The AEA guidelines, for example, have been revised and now reflect values regarding culturally competent evaluation that was not in earlier versions. Guidelines are organic and reflect a particular socio-historical situation.


Michael Morris (2008). Evaluation Ethics for Best Practice. Guilford Press.

American Evaluation Association Guiding principles

On exploding ‘evaluation machines’ and the construction of alt-metrics

The emergence of web-based ways to create and communicate new knowledge is affecting long-established scientific and scholarly research practices (cf. Borgman 2007; Wouters, Beaulieu, Scharnhorst, & Wyatt 2013). This move to the web is spawning a need for tools to track and measure a wide range of online communication forms and outputs. By now, there is a large differentiation in the kinds of social web tools (i.e. Mendeley, F1000,  Impact Story) and in the outputs they track (i.e. code, datasets, nanopublications, blogs). The expectations surrounding the explosion of tools and big ‘alt-metric’ data (Priem et al. 2010; Wouters & Costas 2012) marshal resources at various scales and gather highly diverse groups in pursuing new projects (cf. Brown & Michael 2003; Borup et al. 2006 in Beaulieu, de Rijcke & Van Heur 2013).

Today we submitted an abstract for a contribution to Big Data? Qualitative approaches to digital research (edited by Martin Hand & Sam Hillyard and contracted with Emerald). In the abstract we propose to zoom in on a specific set of expectations around altmetrics: Their alleged usefulness for research evaluation. Of particular interest to this volume is how altmetrics information is expected to enable a more comprehensive assessment of 1. social scientific outputs (under-represented in citation databases) and 2. wider types of output associated with societal relevance (not covered in citation analysis and allegedly more prevalent in the social sciences).

Our chapter we address a number of these expectations by analyzing 1) the discourse in the “altmetrics movement”, the expectations and promises formulated by key actors involved in “big data” (including commercial entities); and 2) the construction of these altmetric data and their alleged validity for research evaluation purposes. We will combine discourse analysis with bibliometric, webometric and altmetric methods in which both methods will also interrogate each others’ assumptions (Hicks & Potter 1991).

Our contribution will show, first of all, that altmetric data do not simply ‘represent’ other types of outputs; they also actively create a need for these types of information. These needs will have to be aligned with existing accountability regimes. Secondly, we will argue that researchers will develop forms of regulation that will partly be shaped by these new types of altmetric information. They are not passive recipients of research evaluation but play an active role in assessment contexts (cf. Aksnes & Rip 2009; Van Noorden 2010). Thirdly, we will show that the emergence of altmetric data for evaluation is another instance (following the creation of the citation indexes and the use of web data in assessments) of transposing traces of communication into a framework of evaluation and assessment (Dahler-Larsen 2012, 2013; Wouters 2014).

By making explicit what the implications are of the transfer of altmetric data from the framework of the communication of science to the framework of research evaluation, we aim to contribute to a better understanding of the complex dynamics in which new generation of researchers will have to work and be creative.

Aksnes, D. W., & Rip, A. (2009). Researchers’ perceptions of citations. Research Policy, 38(6), 895–905.

Beaulieu, A., van Heur, B. & de Rijcke, S. (2013). Authority and Expertise in New Sites of Knowledge Production. In A. Beaulieu, A. Scharnhorst, P. Wouters and S. Wyatt (Eds.), Virtual KnowledgeExperimenting in the Humanities and the Social Sciences. (pp. 25-56). MIT Press.

Borup, M, Brown, N., Konrad, K. & van Lente, H. 2006. “The sociology of expectations in science and technology.” Technology Analysis & Strategic Management 18 (3/4), 285-98.

Brown, N. & Michael, M. (2003). “A sociology of expectations: Retrospecting prospects and prospecting retrospects.” Technology Analysis & Strategic Management 15 (1), 3-18.

Costas, R., Zahedi, Z. & Wouters, P. (n.d.). Do ‘altmetrics’ correlate with citations? Extensive comparison of altmetric indicators with citations from a multidisciplinary perspective.

Dahler-Larsen, P. (2012). The Evaluation Society. Stanford University Press.

Dahler-Larsen, P. (2013). Constitutive Effects of Performance Indicators. Public Management Review, (May), 1–18.

Galligan, F., & Dyas-Correia, S. (2013). Altmetrics: Rethinking the Way We Measure. Serials Review, 39(1), 56–61.

Hicks, D., & Potter, J. (1991). Sociology of Scientific Knowledge: A Reflexive Citation Analysis of Science Disciplines and Disciplining Science. Social Studies of Science, 21(3), 459 –501.

Priem, J., Taraborelli, D., Groth, P., and Neylon, C. (2010a). Altmetrics: a manifesto.

Van Noorden, R. (2010) “Metrics: A Profusion of Measures.” Nature, 465, 864–866.

Wouters, P., Costas, R. (2012). Users, narcissism and control: Tracking the impact of scholarly publications in the 21st century. Utrecht: SURF foundation.

Wouters, P. (2014). The Citation: From Culture to Infrastructure. In B. Cronin & C. R. Sugimoto (Eds.), Next Generation Metrics: Harnessing Multidimensional Indicators Of Scholarly Performance (Vol. 22, pp. 48–66). MIT Press.

Wouters, P., Beaulieu, A., Scharnhorst, A., & Wyatt, S. (eds.) (2013). Virtual Knowledge – Experimenting in the Humanities and the Social Sciences. MIT Press.

Bibliometrics of individual researchers – the debate in Berlin

The lively debate we had at the ISSI conference in Vienna continued at the STI2013 conference, “Translational twists and turns: science as a socio-economic endeavour” 4-6 September in Berlin. A full plenary was devoted to the challenge of, and the dilemmas in, the application of bibliometrics to the (self)-evaluation of individual researchers, chaired by Ben Martin (SPRU). Martin opened the session with the tale of the rise and fall of a star researcher in economics in Germany. Based on a single dataset created in his PhD project, the economist published an impressive amount of publications. Because he was so productive, he was able to attract more external research funding. When a German university was seeking to increase its chance of getting one of the Excellence Initiative grants, he seemed the perfect person to hire. A few members of the hiring committee then started to actually read his publications. They were all rather similar. Not very surprising given that the research was all based on a single dataset from his PhD project. It turned out that he had published a large number of variations of basically the same article in different journals without anyone noticing these duplications. It was the beginning of the end. A number of journals began retracting these publications, although not with the cooperation of the researcher. This process is still ongoing. A sobering tale, according to Martin. He told the story at the start of the debate to warn against the misuses of performance indicators (such as the number of publications). For a recent overview of cases of fraud and Martin’s experiences as editor of Research Policy see (Martin, 2013).

The plenary had a series of presentations, varying from the state of the debate, to examples of a portfolio approach to individual evaluation, to tensions in science policy with respect to indicator based assessments, to the ethics of the evaluation of individual researchers. A report of the meeting will be published in the ISSI Newsletter shortly (Wouters et al. 2013). Here I wish to highlight the ethical questions which were the focus of Jochen Gläser’s presentation. Currently, there is no agreement on this in the field. It was even questioned whether we actually have an ethical problem. According to Peter van den Besselaar, we may have more a knowledge problem than an ethical problem. Often, it is not clear what the different patterns in the indicator measurements mean. This is partly due to the fact that scientometricians often only use a very limited set of databases, such as Scopus or the Web of Science. According to Van den Besselaar, this makes it more difficult to make the measurements more robust. I agree that combining a variety of databases and other data sources (such as surveys or interviews or national statistical materials) is the way to go. The strongest studies in science studies have often used a diversity of materials.

Nevertheless, I don’t think that this absolves us from facing ethical dilemmas, in particular whenever individual researchers are being assessed with the help of metrics. In his presentation, Gläser discussed whether we need more explicit ethical guidelines. After all, the bibliometric centres have developed guidelines and include extensive explanations of the limits of their indicator reports. Moreover, the details of the performance indicators are also published in the bibliometric literature. Still, he argued in favour of more attention to the ethics of bibliometrics because the position of bibliometrics has changed over the years. He identified three relevant developments: an increased demand for bibliometric services in research management; the emergence of “amateur bibliometrics” thanks to the larger availability of data and indicators; and an increased effectiveness of bibliometrics due to more advanced indicators and increased availability of data sets (including web data). The scope of bibliometric practices is therefore increasing and this requires a more explicit set of guidelines of how to apply bibliometric analyses. This holds for scientometric evaluation in general, but it is particularly pertinent when individual researchers are being assessed. Two indicators play an important role in these assessments, the h-index and the Journal Impact Factor and neither of them are fitted to this role (see Bornmann 2013 on the h-index). Gläser put forward a number of proposals. On the short term, he proposed to start collecting experiences and case descriptions in which things seem to go wrong with research assessments. On the medium term, he proposed to develop, as expert community, a set of guidelines that are made available to research directors, managers, science policy officials and deans, and in which the field reaches some consensus with respect to the state of the art. He also supported a suggestion I had made in a parallel session in Berlin to create an Ombudsoffice for research evaluation. This office should be able to look into complaints about the use of bibliometrics by universities and institutes in research management.

We can expect that this debate will continue at the next indicator conferences.


Bornmann, L. (2013). A better alternative to the h index. Journal of Informetrics, 7(1), 100. doi:10.1016/j.joi.2012.09.004

Wouters, P.F., W. Glänzel, J. Gläser and I. Rafols, “The dilemmas of performance indicators of individual researchers – an urgent debate in bibliometrics”, ISSI Newsletter, forthcoming 2013

Martin, B. R. (2013). Whither research integrity? Plagiarism, self-plagiarism and coercive citation in an age of research assessment. Research Policy, 42(5), 1005–1014. doi:10.1016/j.respol.2013.03.011

