CWTS part of H2020 COST Action to stimulate integrity and responsible research

Good news came our way recently! Thed van Leeuwen, Paul Wouters and myself will be part of an EC-funded H2020 COST Action on Promoting Integrity as an Integral Dimension of Excellence in Research (PRINTEGER). Main applicants Hub Zwart and Willem Halffman (Radboud University Nijmegen) brought together highly skilled partners for this network from the Free University Brussels, the University of Tartu (Estonia), Oslo and Akershus University College, Leiden University, and the Universities of Bonn, Bristol, and Trento.

The primary goal of the COST Action is to encourage a research culture that treats integrity as an integral part of doing research, instead of an externally driven steering mechanism. Our starting point: in order to stimulate integrity and responsible research, new forms of governance are needed that are firmly grounded in and informed by research practice.

Concretely, the work entailed in the project will consist of A) a systematic review of integrity cultures and practices; B) an analysis and assessment of current challenges, pressures, and opportunities for research integrity in a demanding and rapidly changing research system; and C) the development and testing of tools and policy recommendations enabling key players to effectively address issues of integrity, specifically directed at science policy makers, research managers and future researchers.

CWTS will contribute to the network with

  • A bibliometric analysis of ‘traces of fraud’ (e.g. retracted articles, manipulative editorials, non-existent authors and papers, fake journals, bogus conferences, non-existent universities), against the background of general shifts in publication patterns, such as changing co-authoring practices, instruments as authors, or the rise of hyper-productive authors;
  • Two in-depth cases studies of research misconduct, not the evident or spectacular, but more particularly reflecting dilemmas and conflicts that occur in grey areas. Every partner will provide two cases; ours will most likely focus on cases of questionable integrity of journal editors (for example cases of impact factor manipulation);
  • Act as task leader on formulation of Advice for research support organisations, including on IT tools. This task will draw conclusions from the research on the operation of the research system, specifically publication infrastructures such as journals, libraries, or data repositories;
  • Like all other partners in the network, we will set up small local advisory panels consisting of five to ten key stakeholders of the project: research policy makers, research leaders or managers, research support organisations, and early career scientists. These panels will meet for a scoping consultation at the start of the projects, for a halfway consultation to discuss intermediate results and further choices to be made, and for a near-end consultation to test the pertinence of tools and advice at a point where we can still make changes to accommodate for stakeholder input.

Knowledge, control and capitalism: what epistemic capitalism makes us see

Guest blog post by Thomas Franssen

On February 5th Max Fochler (University of Vienna) gave a talk during an extended EPIC research group seminar at the CWTS in Leiden. Fochler posed a crucial and critical question regarding knowledge production in the 21th century; can we understand contemporary practices of knowledge production in and outside academia as practices of epistemic capitalism? With this term, defined as ‘the accumulation of capital through the act of doing research’, Max wanted to address the regimes of worth that play a crucial role in life science research in Austria.

Max was interested in exploring the concept of capitalism as it denotes both forms of ascribing worth or value to something (in this case to knowledge and doing research), and the sets of practices in which these forms of worth are embedded. In this way it allows one to talk about which registers or regimes of value are visible as well as the institutional context in which these forms of worth ‘count’ for something.

Using research on the life sciences (partly done with Ulrike Felt and Ruth Müller) Max compared the regimes of value found in biographical interviews with postdocs working in Austrian academia to those of founders, managers and employees of research-active small biotech companies in Austria.

Results showed that the postdocs in their study are preoccupied with their own future employability, and that they assess their own worth in terms of the capital that they can accumulate. This capital consists of publications, impact factors, citations and grant-money. What is especially critical in this respect is that potential sites of work, social relations with others, and choices for particular research topics or model organisms are scrutinized in relation to the effect they might have on the accumulation of capital. Importantly, also for research policy and higher education studies, this is the only strategy that this sample of postdocs sees as viable. They do not see other regimes of valuation available for them. As such they either comply to the rules of the game or opt out of the academic system entirely.

In biotech companies the situation is very different. The accumulation of epistemic capital plays a smaller role in the biographies of those working for biotech companies. The main difference, Max observed, is that failure and success are attributed to companies rather than individuals. The intense competition and focus on the individual as experienced by postdocs in the life sciences is less intense in biotech. As such, the essence of working in biotech is not the accumulation of capital, but the development of the company. Capital is not an end in itself but used strategically when possible.

Thinking through epistemic capitalism with biodiversity

Esther Turnhout (Wageningen University) was invited by Sarah to comment on Max Fochler’s talk. Turnhout’s research focuses on information and accountability infrastructures in forest certification and auditing in global value chains. She started her response by asking whether the concept of epistemic capitalism made her look at her own case stuy materials differently and if so how? Not to interrogate the concept and test it empirically but rather to make clear what it highlights and what it affords.

Her criticism of the term came down to two aspects, which she explained using the case of biodiversity. Most importantly, the concept of epistemic capitalism ties the development of knowledge to the accumulation of capital directly and it has the tendency to reduce everything it captures to one single mechanism or logic.

To make her case, Esther traced the knowledge making practices in biodiversity research historically. She did so by focusing on the rise of so-called ecosystem services. Within ecosystem services biodiversity knowledge has become mainly utilitarian, and biodiversity itself an object that presents economic value because it has not yet been destroyed. Think for example of forest carbon, which represents a value on the carbon market as long as it is locked in the forest itself.

So here, in the commodification of biodiversity, knowledge and capital are again closely related. This, however, is not the main argument that Esther took from this example. Rather, she argued that in many ways ecosystem services are very similar to the history of biodiversity knowledge. In all cases, the knowledge produced must be rendered technical, it is assumed to be linear and it privileges scientific expertise. More importantly, there is a preoccupation with ‘complete knowledge’, which is seen as needed for effective conservation. Also, this type of knowledge is increasingly used for managerial concerns to measure success or effectiveness of policy.

As such, disconnected from capitalist or economic concerns, in biodiversity knowledge three logics come together: a technocratic logic, a managerial logic, and a logic of control. For her case, a focus on epistemic capitalism and the accumulation of capital does not work so well. The issue of a technocratic ideal of total control would disappear from view if ecosystems services are only regarded as a commodification of nature. It is the issue of control, which can be understood from a range of logics (technocratic, managerial even aesthetic), that currently prevents urgently needed action. This is because there is an experienced lack of ‘total information’, a total which – seen from technocratic and managerial logics – is needed to act. According to Turnhout it is this utopian ideal of ‘technocratic control through complete information’ that should be criticised much more strongly.

In Search of Excellence? Debating the Merits of Introducing an Elite Dutch University Model

Report by Alex Rushforth

Should the Netherlands strive for excellence in its university systems? Will maintaining quality suffice? This was the topic of a recent panel debate at the WTMC annual meeting on 21 November 2014 in De Balie, Amsterdam. Organised and chaired by Willem Halffman, the session focused on an article published by Barend Van Der Meulen in the national newspaper De Volkskrant, which advocated the need to produce two excellent universities which excel on internationally published rankings, thereby creating a new top-tier in the Dutch higher education system.

Both van der Meulen and Halffman presented their views, with an opposing position also coming from Sally Wyatt. Completing the panel, CWTS’s very own Paul Wouters provided results from recent empirical work about rankings.

Barend van der Meulen’s call for an elite university stemmed from the fact Dutch universities perennially sit outside of the top-50 in Shanghai and Times Higher Education rankings. For him the message is clear: the Netherlands is repeatedly failing to enhance its reputation as an elite player among global universities, a position which ought to cause concern. Van der Meulen stated that his call for an elite university model is part of a need to create an expanded repertoire of what universities are and what they should do in the Netherlands. The pursuit of rankings through this vehicle is therefore tightly coupled with a rejection of the status quo. Rankings are a social technology which ought to be harnessed for quality improvement and as tools through which to promote democratic participation by equipping students and policymakers with tools to make judgments and exert some influence over universities. Alternative modes of evaluation like peer review provide closed systems in which only other academics can make judgments, leaving university activities unaccountable to external modes of evaluation. This ‘ivory tower’ situation reminiscent of the 1980s is an image Van Der Meulen wishes to escape from, as ultimately it damages credibility and legitimacy of universities. The reliance on public money for research and education makes the moral case for university improvement and accountability particularly pressing in the Netherlands. For Van Der Meulen, the ‘good enough’ university (see Wyatt’s argument below) is not enough, given that excellence is imposing itself as a viable and increasingly important alternative.

First to oppose the motion in favour of elite universities was Willem Hallfman, whose talk built on a reply co-authored with Roland Bal, also in De Volkskrant. In the talk Halffman questioned the very foundations of the idea that ‘excellence’ ought to be pursued. Drawing unflattering comparisons between the research budget of Harvard University and that of the entire Netherlands, it was argued that competing within a global superleague would require a radical expansion of existing research budgets and wage structures across the Dutch university system, which he felt unrealistic and unreasonable against a backdrop of crisis in public finances. As well as reproducing national elites, Halffman also questioned the desirability of ranking systems which promote academic stars and the consequences this brings to institutions of science in general and Dutch universities in particular. Football-style league tables provide poor models on which to rate universities, as in contrast with sport where a winner-takes-all logic is central, for universities embodying a broad repertoire of societal functions, it is not clear what ‘winning’ means and how this would be made visible and commensurable through performance indicators.

Sally Wyatt recounted her personal experiences of the shock she encountered when studying and working in British universities in the 1980s, having grown-up in Canada within a period of prosperity and social mobility. These experiences fired a series of warning shots not to go down a road of pursuing excellence. When a move to the Netherlands came about in 1999, it promised her an oasis away from the turmoil the British university system had faced as a result of Thatcherite policy reforms. With the emergence of the Research Assessment Exercise (RAE) and its ranking logic comes also a rise in managerial positions and policies, decline in working conditions, and a widening gender gap. Gone also was a latent class system engrained in the culture of universities, with dominant elite institutions the site of social stratification reproduced across generations, which rankings merely encourage and reinforce. Despite erosion of certain positive attributes in universities since her arrival in the Netherlands, Wyatt argued that the Dutch system still preserves enough of a ‘level-playing field’ in terms of funding allocation to merit fierce resistance to any introduction of an elite university model. For Wyatt sometimes it is better to promote the ‘good enough’ than to chase an imperialist and elitist vision of ‘excellence’.

Drawing on work on university and hospital rankings carried-out with Sarah De Rijcke (CWTS), Iris Wallenburg and Roland Bal (Erasmus MC, Rotterdam), Paul Wouters’ talk advocated the need for a more fine-grained STS investigations into the kinds of work that goes into rankings, who is doing it, and in what situations. What is at stake in studying rankings then is not simply the critique of this or that tool, but a more pervasive (and sometimes invisible) logic/set of practices encountered across public organisations like universities and hospitals. Wouters advocated a move towards combining audit society critiques (which tend to be top-down) with STS insights into how ranking is practiced across various organisational levels in universities. This would provide a more promising platform through which to inform debates of the kind playing-out over the desirability of the elite university.

So the contrast between positions was stark. Are rankings – these seemingly ubiquitous ordering mechanisms of contemporary social life – something the Netherlands can afford to back away from in governing its universities? If they are being pursued anyway, shouldn’t policy intervene and assist a more systematic pursuit up the rankings which would enable more pronounced successes? Or is it necessary to oppose the very notion that the Netherlands needs to excel in a ‘globally competitive’ race, particularly given the seeming arbitrariness of many of the metrics according to which prestige gets attributed via ranking mechanisms? Despite polarization on what is to be done, potential for extending STS’s conceptual and empirical apparatus to mediate these discussions seemed to strike a chord among panelists and the audience alike. No doubt this stimulating debate touches on a set of issues that will not be going away quickly, and is one on which the WTMC community is surely well placed to intervene.

Ethics and misconduct – Review of a play organized by the Young Academy (KNAW)

This is a guest blog post by Joost Kosten. Joost is PhD student at CWTS and member of the EPIC working group. His research focuses on the use of research indicators from the perspective of public policy. Joost obtained an MSc in Public Administration (Leiden University) and was also trained in Political Science (Stockholm University) and Law (VU University Amsterdam).

Scientific (mis)conduct – The sins, the drama, the identification

On Tuesday November 18th 2014 the Young Academy of the Royal Netherlands Academy of Sciences organized a performance of the play Gewetenschap by Tony Maples at Leiden University. These weeks, Pandemonia Science Theater is on tour in the Netherlands to perform this piece at several universities. Gewetenschap was inspired by occasional troubles with respect to ethics and integrity which recently occurred in Dutch science and scholarship. Although these troubles concerned grave violations of the scientific code of conduct (i.e., the cardinal sins of fraud, fabrication, and plagiarism) the play focusses on common dilemma’s in a researcher’s everyday life. The title Gewetenschap is a non-existent word, which combines the Dutch words geweten (conscience) and wetenschap (science).

The playwright used confidential interviews with members of the Young Academy to gain insight into the most frequently occurring ethical dilemma’s researchers have to deal with. Professor Karin de Zwaan is a research group leader who has hardly any time to do research herself. She puts much effort in organizing grants, attracting new students and organizing her research group. Post-doc Jeroen Dreef is a very active researcher who does not have enough time to take organizational responsibilities serious. A tenure track is all he wants. Given their other important activities, Karin and Jeroen hardly have any time to supervise PhD student Lotte. One could question the type of support they do give her.

At times, given the reaction on scenes of the drama piece, the topics presented were clearly recognized by the audience. Afterwards, the dilemma’s touched upon during the play are presented by prof. Bas Haring. The audience discusses the following topics:

  • Is there a conflict between the research topics a researcher likes himself and what the research group expects her/him to do?
  • In one of the scenes, the researchers were delighted because of the acceptance of a publication. Haring asks if that exhibits “natural behaviour”. Shouldn’t a researcher be happy with good results instead of a publication being accepted? One of the participants replies that a publication functions as a reward.
  • What do you do with your data? Is endless application of a diversity analysis methods until you find nice results a responsible approach?
  • What about impact factors (IF)? Bas Haring himself says his IF is 0. “Do you think I am an idiot?” Which role do numbers such as the IF play in your opinion about colleagues? There seems to be quite a diversity of opinions. An early career research says everone knows these numbers are nonsense. An experienced scientist points out that there is a correlation between scores and quality. Someone else expresses his optimism since he expects that this focus on numbers will be over with ten years. This causes another to respond that in the past there was competition too, but in a different way.
  • When is someone a co-author? This question results in a lively debate. Apparently, there are considerable differences from field to field. In the medical fields, a co-authorship can be a way to express gratitude to authors who have played a vital role in a research project, such as people who could organize experimental subjects. In this way, a co-authorship becomes a tradeable commodity. A medicine professor points out that in his field, co-authorships can be used to compare a curriculum vitae with the development of status as a researcher. Thus, it can be used as a criterion to judge grant proposals. A good researcher should start with first position co-authorships, later on should have co-authorships somewhere in between the first and last author, and should end his career with papers in which has co-authorships in the last position. Thus, the further the career has been developed, the more the name of the other should be in the final part of the author list. Another participant states that one can deal with co-authorships in three different ways: 1. Co-authors should always have full responsibility for everything in the paper. 2. Similar to openness which is given at the end of a movie, co-authors should clarify what each co-author’s contribution was. 3. Only those who really contributed in writing a paper can be a co-author. The participant admits that this last proposal works in his own field but might not work in other fields.
  • Can a researcher exaggerate his findings if he presents them to journalists? Should you keep control over a journalist’s work in order to avoid that he will present things differently? Is it allowed to present untruth information in order to help support your case, just to avoid that a proper scientific argumentation will be too complex for the man in the street?
  • Is it allowed to to present your work as having more societal relevance than you really expect? One of the reactions is that researchers are forced to express the societal relevance of their work when they apply for a grant. From the very nature of scientific research it is hardly possible to clearly indicate what society will gain from the results.
  • What does a good relationship between a PhD-student and a supervisor look like? What is a good balance between serving the interests of PhD students, serving organizational interests (e.g. the future of the organization by attracting new students and grants), and the own interest of the researcher?

The discussion did not concentrate on the following dilemma´s presented in Gewetenschap:

  • To what extent are requirements for grant proposals contradictory? On the one hand, researchers are expected to think ‘out-of-the-box’ while on the other hand they should meet a large amount of requirements. Moreover, should one propose new ideas including the risks which come along, or is it better to walk on the beaten path in order to guarantee successes?
  • Should colleagues who did not show respect be served with the same sauce if you have a chance to review their work? Should you always judge scientific work on its merits? Are there any principles of ‘due process’ which should guide peer review?
  • Whose are the data if someone contributed to them but moves to another research group or institute?

 

Quality in the age of the impact factor

ISIS, the most prestigious journal in the history of science, moved house last September and its central office is now located at the Descartes Centre for the History and Philosophy of the Sciences and Humanities at Utrecht University. The Dutch science historian H. Floris Cohen took up the position of the editor in chief of the journal. No doubt this underlines the international reputation of the community of historians of science in the Netherlands. Being the editor of the central journal in ones field surely is mark of esteem and quality.

The opening of the editorial office in Utrecht was celebrated with a symposium entitled “Quality in the age of the impact factor”. Since quality of research in history is intimately intertwined with the quality of writing, it seemed particularly apt to call attention to the role of impact factors in humanities fields. I used the occasion to pose the question how we actually define scientific and scholarly quality. How do we recognize quality in our daily practices? And how can this variety of practices be understood theoretically? Which approaches in the field of science and technology studies are most relevant?

In the same month, Pleun van Arensbergen graduated on a very interesting PhD dissertation which dealt with some of the issues, “Talent Proof. Selection Processes in Research Funding and Careers”. Van Arensbergen did her thesis work at the Rathenau Institute in The Hague. The quality of research is increasingly seen as mainly the result of the quality of the people involved. Hence, universities “have openly made it one of their main goals to attract scientific talent” (van Arensbergen, 2014, p. 121). A specific characteristics of this “war for talent” in the academic world is that there is an oversupply of talents and a relative lack of career opportunities, leading to a “war between talents”. The dissertation is a thorough analysis of success factors in academic careers. It is an empirical analysis of how the Dutch science foundation NWO selects early career talent in its Innovational Research Incentives Scheme. The study surveyed researchers about their definitions of quality and talent. It combines this with an analysis of both the outcome and the process of this talent selection. Van Arensbergen paid specific attention to the gender distribution and to the difference between successful and unsuccessful applicants.

Her results point to a discrepancy between the common notion among researchers that talent is immediately recognizable (“you know it when you see it”) and the fact that there are very small differences between candidates that get funded and those that do not. The top and the bottom of the distribution of quality among proposals and candidates are relatively easy to detect. But the group of “good” and “very good” proposals is still too large to be funded. Van Arensbergen and her colleagues did not find a “natural threshold” above which the successful talents can be placed. On the contrary, in one of her chapters they find that researchers who leave the academic system due to lack of career possibilities regularly score higher on a number of quality indicators than those who are able to continue a research career. “This study does not confirm that the university system always preserves the highly productive researchers, as leavers were even found to outperform the stayers in the final career phase (van Arensbergen, 2014, p. 125).

Based on the survey, her case studies and her interviews, Van Arensbergen also concludes that productivity and publication records have become rather important for academic careers. “Quality nowadays seems to a large extent to be defined as productivity. Universities seem to have internalized the performance culture and rhetoric to such an extent that academics even define and regulate themselves in terms of dominant performance indicators like numbers of publications, citations or the H-index. (…) Publishing seems to have become the goal of academic labour.” (van Arensbergen, 2014, p. 125). This does not mean, however, that these indicators determine the success of a career. The study questions “the overpowering significance assigned to these performance measures in the debate, as they were not found to be entirely decisive.” (van Arensbergen, 2014, p. 126) An extensive publication record is a condition but not a guarantee for success.

This relates to another finding: the group process of panel discussions are also very important. With a variety of examples, Van Arensbergen shows how the organization of the selection process shapes the outcome. The face to face interview of the candidate with the panel is for example crucial for the final decision. In addition, the influence of the external peer reports was found to be modest.

A third finding in the talent dissertation is that success in obtaining grants feeds back into ones scientific and scholarly career. This creates a self reinforcing mechanism, which the science historian Robert Merton coined the Matthew effect after the quote from the bible: “For unto every one that hath shall be given, and he shall have abundance: but from him that hath not shall be taken even that which he hath.” (Merton, 1968). Van Arensbergen concludes that this means that differences between scholars may initially be small but will increase in the course of time as a result of funding decisions. “Panel decisions convert minor differences in quality into enlarged differences in recognition.”

Combining these three findings leads to some interesting conclusions regarding how we actually define and shape quality in academia. Although panel decisions about who to fund are strongly shaped by the organization of the selection process as well as by a host of other contextual factors (including chance), and although all researchers are aware of the uncertainties in these decisions, this does not mean that these decisions are given less weight. On the contrary, obtaining external grants has become a cornerstone for successful academic careers. Universities even devote considerable resources to make their researchers abler to acquire prestigious grants as well as external funding in general. Although this is clearly instrumental for the organization, Van Arensbergen thinks that grants have become part of the symbolic capital of a researcher and research group and she refers to Pierre Bourdieu’s theory of symbolic capital to better understand the implications.

This brings me to my short lecture at the opening of the editorial office of ISIS in Utrecht. Although the experts on bibliometric indicators don’t generally see the Journal Impact Factor as an indicator of quality, socially it seems to partly function like it. But indicators are not alone in shaping how we in practice identify, and thereby define, talent and quality. They flow together with the way quality assurance and measurement processes are organized, the social psychology of panel discussions, the extent to which researchers are visible in their networks, etc. In these complex contextual interactions, indicators do not determine but they are ascribed meaning dependent on the situation in which the researchers find themselves. A good way to think about this, in my view, is developed in the field of material semiotics. This approach which has its roots in the French actor network theory of Bruno Latour and Michel Callon, does not accept a fundamental rupture in reality between the material and the symbolic. Reality as such is the result of complex and interacting translation processes. This is an excellent philosophical basis to understand how scientific and scholarly quality emerge. I see quality not as an attribute of an academic persona or of a particular piece of work, but as the result of the interaction between a researcher (or a manuscript) and the already existing scientific or scholarly infrastructure (eg. the body of published studies). If this interaction creates a productive friction (meaning that there is enough novelty in the contribution but not so much that it is incompatible with the already existing body of work), we see the work or scholar as of high quality. In other words, quality does simply not (yet) exist outside of the systems of quality measurement. The implication of this is that quality itself is a historical category. It is not an invariant but a culturally and historically specific concept that changes and morphes over time. In fact, the history of science is the history of quality. I hope historians of science will take up the challenge to map this history in more empirical and theoretical sophistication than has been done so far.

Literature:

Merton, R. K. (1968). The Matthew Effect in Science. Science, 159, 56–62.

Van Arensbergen, P. (2014). Talent proof : selection processes in research funding and careers. The Hague, Netherlands: Rathenau Institute. Retrieved from http://www.worldcat.org/title/talent-proof-selection-processes-in-research-funding-and-careers/oclc/890766139&referer=brief_results

 

Developing guiding principles and standards in the field of evaluation – lessons learned

This is a guest blog post by professor Peter Dahler-Larsen. The reflections below are a follow-up of his keynote at the STI conference in Leiden (3-5 September 2014) and the special session at STI on the development of quality standards for science & technology indicators. Dahler-Larsen holds a chair at the Department of Political Science, University of Copenhagen. He is former president of the European Evaluation Society and author of The Evaluation Society (Stanford University Press, 2012).

Lessons learned about the development of guiding principles and standards in the field of evaluation – A personal reflection

Professor Peter Dahler-Larsen, 5 October 2014

Guidelines are symbolic, not regulatory

The limited institutional status of guiding principles and standards should be understood as a starting point for the debate. In the initial phases of development of such standards and guidelines, people often have very strong views. But only the state can enforce laws. To the extent that guidelines and standards merely express some official views of a professional association who has no institutional power to enforce them, standards and guidelines will have limited direct consequences for practitioners. The discussion becomes clearer once it is recognized that standards and guidelines thus primarily have a symbolic and communicative function, not a regulatory one. Practitioners will continue to be free to do whatever kind of practice they like, also after guidelines have been adopted.

Design a process of debate and involvement

All members of a professional association should have a possibility to comment on a draft version of guidelines/standards. An important component in the adoption of guidelines/standards is the design of a proper organizational process that involves the composition of a draft by a select group of recognized experts, an open debate among members, and an official procedure for the adoption of standards/guidelines as organizational policy.

Acknowledge the difference between minimum and maximum standards

Minimal standards must be complied with in all situations. Maximum standards are ideal principles worth striving for, although they will not be accomplished in any particular situation. It often turns out that there will be many maximum principles in a set of guidelines, although that is not what most people believe is “standards.” For that reason I personally prefer the term guidelines or guiding principles rather that “standards.”

Think carefully about guidelines and methodological pluralism

Advocates of a particular method often think that methodological rules connected to their own method defines quality as such in the whole field. For that reason, they are likely to insert their own methodological rules into the set of guidelines. As a consequence, guidelines can be used politically to promote one set of methods or one particular paradigm rather than another. Great care should be exercised in the formulation of guidelines to make sure that pluralism remains protected. For example, in evaluation the rule is that if you subscribe to a particular method, you should have high competence in the chosen method. But that goes for all methods.

Get beyond the “but that´s obvious” argument

Some argue that it is futile to formulate a set of guidelines because at that level of generality, it is only possible to state some very broad and obvious principles with which every sensible person must agree. The argument sounds plausible when you hear it, but my experience suggests otherwise for a number of reasons. First, some people have just not thought about a very bad practice (for example, doing evaluation without written Terms of Reference). Once you see, that someone has formulated a guideline against this, you are likely to start paying attention to the problem. Just because a principle is obvious to some, does not mean that it is obvious to all. Second, although there may be general agreement about a principle (such as “do no unnecessary harm” or “take general social welfare into account”), there can be strong disagreement about the interpretations and implications of the principle in practice.  Third, a good set of guiding principles will often comprise at least two principles that are somewhat in tension with each other, for example the principle of being quick and useful versus the principle of being scientifically rigorous. To sort out exactly which kind of tension between these two principles one can live with in a concrete case turns out to be a matter of complicated professional judgment. So, get beyond the “that´s obvious” argument.

Recognize the fruitful uses of guidelines

Among the most important uses of guidelines in evaluation are:

– In application situations, good evaluators can explain their practice with reference to broader principles

– In conferences, guidelines can stimulate insightful professional discussions about how to handle complicated cases

– Books and journals can make use of guidelines as inspiration for the development of an ethical awareness among practitioners. For example, google Michael Morris´ work in the field of evaluation.

– There is great use of guidelines in teaching and in other forms of socialization of evaluators.

Respect the multiplicity of organizations

If, say, the European Evaluation Society wants to adopt a set of guidelines, it should be respected that, say, the German and the Swiss association already have their own guidelines. Furthermore, some professional associations (say, psychologists) also have guidelines. A professional association should take such overlaps seriously and find ways to exchange views and experiences with guidelines across national and organizational borders.

Professionals are not alone, but relations can be described in guidelines, too

It is often debated that one of the major problems in bad evaluation practice is the behavior of commissioners. Some therefore think that guidelines describing good evaluation practice are in vain until the behavior of commissioners (and perhaps other users of evaluation) are included in the guidelines, too. However, there is no particular reason why the guidelines cannot describe a good relation and a good interaction between commissioners and evaluators. Remember, guidelines have no regulatory power. They express merely the official norms of the professional association. Evaluators are allowed to express what they think a good commissioner should do or not do. In fact, explicit guidelines can help clarify mutual and reciprocal role expectations.

Allow for regular reflection, evaluation and revision of guidelines

At regular intervals, guidelines should be debated, evaluated and revised. The AEA guidelines, for example, have been revised and now reflect values regarding culturally competent evaluation that was not in earlier versions. Guidelines are organic and reflect a particular socio-historical situation.

Sources:

Michael Morris (2008). Evaluation Ethics for Best Practice. Guilford Press.

American Evaluation Association Guiding principles

The Leiden manifesto in the making: proposal of a set of principles on the use of assessment metrics in the S&T indicators conference

Summary

A set of guiding principles (a manifesto) on the use of quantitative metrics in research assessment was proposed by Diana Hicks (Georgia Tech) during a panel session on quality standards for S&T indicators at the STI conference in Leiden last week. Various participants in the debate agreed on the responsibility of the scientometric community in better supporting use of scientometrics. Finding the choice of specific indicators too constraining, many voices supported the idea of a joint publication of a set of principles which should guide a responsible use of quantitative metrics. The session also included calls for scientometricians to take a more proactive role as engaged and responsible stakeholders in the development and monitoring of metrics for research assessment, as well as in wider debates on data governance of, such as infrastructure and ownership.

In the closure of the conference, the association of scientometric institutes ENID (European Network of Indicators Designers) and Ton van Raan as president, offered to play a coordinating role in writing up and publishing a consensus version of the manifesto.

Full report of the plenary session at the 2014 STI conference in Leiden on Quality standards for evaluation: Any chance of a dream come true?

The need to debate these issues has come to the forefront in light of reports that uses of certain easy-to-use and potentially misleading metrics for evaluative purposes have become a routine part of academic life, despite misgivings within the profession itself about its validity. A central aim of the special session was to discuss the need for a concerted response from the scientometric community to produce more explicit guidelines and expert advice on good scientometric practices. The session continued from the 2013 ISSI and STI conferences in Vienna and Berlin, where full plenary sessions were convened on the need for standards in evaluative bibliometrics, and the ethical and policy implications of individual-level bibliometrics.

This year’s plenary session started with a summary by Ludo Waltman (CWTS) of the pre-conference workshop on technical aspects of advanced bibliometric indicators. The workshop, co-organised by Ludo, was attended by some 25 participants, and topics that were addressed included 1. Advanced bibliometric indicators (strengths and weaknesses of different types of indicators; field normalization; country-level and institutional-level comparisons); 2. Statistical inference in bibliometric analysis; and 3. Journal impact metrics (strenghts and weaknessess of different journal impact metrics; use of the metrics in the assessment of individual researchers). The workshop discussions were very fruitful and some common ground was found, but that there also remained significant differences in opinion. Some topics that need further discussion are technical and mathematical properties of indicators (e.g., ranking consistency); strong correlations between indicators; the need to distinguish between technical issues and usage issues; purely descriptive approaches vs. statistical approaches, and the importance of user perspectives for technical aspects of indicator production. There was a clear interest in continuing these discussions at a next conference. The slides of the workshop are available on request.

Ludo’s summary was followed by a short talk by Sarah de Rijcke (CWTS), to set the scene for the ensuing panel discussion. Sarah provided an historical explanation for why previous responses by the scientometric community about misuses of performance metrics and the need for standards have landed in deaf ears. Evoking Paul Wouters’ and Peter Dahler-Larsen’s introductory and keynote lectures, she argued that the preferred normative position of scientometrics (‘We measure, you decide’) and the tendency to provide upstream solutions no longer serve the double role of the field very well. As an academic as well as a regulatory discipline, scientometrics not only creates reliable knowledge on metrics, but also produces social technologies for research governance. As such, evaluative metrics attain meaning in a certain context, and they also help shape that context. Though parts of the community now acknowledge that there is indeed a ‘social’ problem, ethical issues are often either conveniently bracketed off or ascribed to ‘users lacking knowledge’. This reveals unease with taking any other-than-technical responsibility. Sarah plugged the idea of a short joint statement on proper uses of evaluative metrics, proposed at the international workshop at OST in Paris (12 May 2014). She concluded with a plea for a more long-term reconsideration of the field’s normative position. If the world of research governance is indeed a collective responsibility, then scientometrics should step up and accept its part. This would put the community in a much better position to actually engage productively with stakeholders in the process of developing good practices.

In the ensuing panel discussion, Stephen Curry (professor of Structural Biology at Imperial College, London, and member of HEFCE steering group) expressed a deep concern about the seducing power of metrics in research assessment and saw a shared, collective responsibility for the creation and use of metrics on the side of bibliometricians, researchers and publishers alike. Thus according to him technical and usage aspects of indicators should not be separated artificially.

Lisa Colledge (representing Elsevier as Snowballmetrics project director) talked about the Snowballmetrics initiative, and presented it as a bottom-up and practical approach with the goal to meet the needs of funding organizations and university senior level management. According to Lisa, while it primarily addresses research officers, feedback from the academic community of bibliometrics is highly appreciated to contribute to the empowerment of indicator users.

Stephanie Haustein (University of Montreal) was not convinced that social media metrics (a.k.a. altmetrics) lend itself to standardization due to heterogeneity of data sources (tweets, views, downloads) and their constantly changing nature. She stated that meaning of altmetrics data is highly ambiguous (attention vs. significance) and a quality control similar to the peer review system in scientific publications does not yet exist.

Jonathan Adams (Chief scientist at Digital Science) approved the idea of setting up a statement but emphasized that it would have to be short, precise and clear to also catch the attention of government bodies, funding agencies and senior level university management who are uninterested in technical details. Standards will have to live up to the fast-paced change (data availability, technological innovations). He was critical of any fixed set of indicators since this would not accommodate the strategic interests of every organization.

Diana Hicks (Georgia Institute of Technology) presented a first draft of a set of statements (the “Leiden Manifesto”), which she proposed should be published in a top-tier journal like Nature or Science. The statements are general principles on how scientometric indicators should be used, such as for example, ‘Metrics properly used support assessments; they do not substitute for judgment’ or ‘Metrics should align with strategic goals’.

In the ensuing debate, many participants in the audience proposed initiatives and problems that need to be solved. They were partially summarized by Paul Wouters who identified four issues around which the debate evolved. First, he proposed that a central issue is the connection between assessment procedures and the primary process of knowledge creation. If this connection is severed, assessments lose part of their usefulness for researchers and scholars.

The second question is what kind of standards are desirable. Who sets them? How open are they to new developments and different stakeholders? How comprehensive and transparent are or should standards be? What interests and assumptions are included within them? In the debate it became clear that scientometricians do not want to determine the standards themselves. Yet standards are being developed by database providers and universities, now busy building up new research information systems. Wouters proposed that the scientometric community sets as its goal to monitor and analyze evolving standards. This could help to better understand problems and pitfalls and also provide technical documentation.

The third issue highlighted by Wouters is the question of who is responsible. While the scientometric community cannot assume full responsibility for all evaluations in which scientometric data and indicators play a role, it can certainly broaden out its agenda. Perhaps an even more fundamental question is how public stakeholders can remain in control of the responsibility for publicly funded science when more and more meta-data is being privatized. Wouters pleaded for strengthening the public nature of the infrastructure of meta-data, including current research information systems, publication databases and citation indexes. This view does not deny the important role for for-profit companies who are often more innovative. Fourth, Wouters suggested that taking these issues together provides an inspiring collective research agenda for the scientometrics community.

Diana Hicks’ suggestion of a manifesto or set of principles was followed up on the second day of the STI conference at the annual meeting of ENID (European Network of Indicators Designers). The ENID assembly, and Ton van Raan as president, offered to play a coordinating role in writing up the statement. Diana Hicks’ draft will serve as a basis, and it will also be informed by opinions from the community, important stakeholders and intermediary organisations, as well as those affected by evaluations. The debate on standardization and use will be continued in upcoming science policy conferences, with a session confirmed for the AAAS (San José, February) and expected sessions in the STI and ISSI conferences in 2015.

(Thanks to Sabrina Petersohn for sharing her notes of the debate.)

Ismael Rafols (Ingenio (CSIC-UPV) & SPRU (Sussex); Session chair); Sarah de Rijcke (CWTS, Leiden University); Paul Wouters (CWTS, Leiden University)

%d bloggers like this: