Still using the Hirsch index? Don’t!

“My research: > 185 papers, h-index 40.” A random quote from a curriculum vitae in the World Wide Web. Sometimes, researchers love their Hirsch index, better known as the h-index. But what does the measure actually mean? Is it a reliable indicator of scientific impact?

Our colleagues Ludo Waltman and Nees Jan van Eck have studied the mathematical and statistical properties of the h-index. Their conclusion: the h-index can produce inconsistent results. For this reason, it is actually not the reliable measure of scientific impact that most users think it is. As a leading scientometric institute, we have therefore published the advice to all universities, funders, and academies of science to abandon the use of the h-index as a measure of the overall scientific impact of researchers or research groups. There are better alternatives. The paper by Waltman and Van Eck is now available as a preprint and will soon be published by the Journal of the American Society for Information Science and Technology JASIST.

The h-index is a measure of a combination of productivity and citation impact. It is calculated by ordering the number of publications by a particular researcher on the basis of the total number of citations they have received. For example, someone who has an h-index of 40 has published at least 40 articles that have each been cited at least 40 times. Moreover, the remaining articles have not been cited more than 40 times each. The higher the h-index the better.

The h-index was proposed by physicist Jorge Hirsch in 2005. It was an immediate hit. Nowadays, there are about 40 variants of the h-index. About one quarter of all articles published in the main scientometric journals have cited Hirsch’ article in which he describes the h-index. Even more important has been the response by scientific researchers using the h-index. The h-index has many fans, especially in the fields that exchange many citations, such as the biomedical sciences. The h-index is almost irrresistable because it seems to enable a simple comparison of the scientific impact of different researchers. Many institutions have been seduced by the siren call of the h-index. For example, the Royal Netherlands Academy of Arts and Sciences (KNAW) in the Netherlands inquires about the value of the h-index in its recent forms for new members. Individual researchers can look up their h-index based on Google Scholar documents via Harzing’s website publish or perish. Both economists and computer scientists have produced a ranking of their field based on the h-index.

Our colleagues Waltman and Van Eck have now shown that the h-index has some fatal shortcomings. For example, if two researchers with a different h-index co-author a paper together, it may lead to a reversal of their position in an h-index based ranking. The same may happen when we compare research groups. Suppose we have two groups and each member of group A has a higher h-index than a paired researcher in group B. We would now expect that the h-index of group A as group is also higher than that of group B. Well, that does not have to be the case. Please note that we are now speaking of a calculation of the h-index based on a complete and reliable record of documents and citations. The problematic nature of the data if one uses Google Scholar as data source is a different matter. So, even when we have complete and accurate data, the h-index may produce inconsistent results. Surely, this is not what one wants using the index for evaluation purposes!

At CWTS, we have therefore drawn the conclusion that the h-index should not be used as measure of scientific impact in the context of research evaluation.

Advertisements

Not much news in new Shanghai rankings

Two weeks before the start of the 2011 academic season, the latest issue of the Academic Ranking of World Universities (ARWU) was published. The response to this ranking in the Netherlands is telling about the importance ascribed to global university rankings. Utrecht University saw its position improved with 2 points and went to number 48. Leiden University went up 15 points and is now second after Utrecht at number 65. All Dutch universities are now listed among the 500 “best universities” in the world. The organization of Dutch universities VSNU was thrilled. This was an “excellent performance”, according to the organization, because “the Shanghai Ranking is in itself already a selection of the five hundred best universities in the world. This means that the Dutch universities belong to the best 3 percent of the total universities in the world (17,000).” In our view, this shows that the VSNU has not really understood the point of this ranking and the rationales behind its construction.

All measurements are preceded by decisions pertaining to the object(s) and focus of measurement. In this categorization process, certain factors will be labeled as relevant and others as less or irrelevant. Decisions will be made pertaining to the parameters of the categories that will be taken into account. These decisions fundamentally shape the subsequent measurements. The ARWU ranking is based on the data of 1,000 universities (the other 16,000 are not taken into account). The ranking strongly favours large universities. Because Nobel Prizes and Field Medals have a strong impact on the total ranking, and other prestigious prizes are not taken into account, the ARWU advantages Anglo-Saxon universities and the universities focused on the exact and medical sciences. From its beginning in 2003, the ARWU ranking is led by US universities, with Harvard as number one. The only non-US universities among the top ten are Oxford and Cambridge.

The way research performance is measured in the Shanghai ranking is also problematic. The number of articles in the journals Nature and Science determine 20 % of the ranking score, but prestigious monodisciplinary journals such as Cell or Physica Acta do not weigh so heavily. Influential humanities researchers are almost invisible in the ranking. Just before the Summer, the European University Association pointed to the disadvantages of the most popular global university rankings. In fact, they only rank the elite of the international university system. Moreover, composite rankings like the Shanghai Ranking merge different aspects of university performance (research, teaching, valorization, social impact) into one number. How this composite number is calculated is rather arbitrary and not always transparent. It is therefore unclear to what extent a change in position has anything to do with change in performance.

For example, it is quite certain that the small improvement of Utrecht University is a fluctuation without any significance. Additionally, even a seemingly robust improvement of the performance of a university can be caused by an individual outlier. According to the website Transfer, the three Dutch universities that saw their position most strongly improved had three individual researchers to thank for this improvement. Radboud University went up thanks to Nobel Prize winner Konstantin Novoselov. Eindhoven’s technical university should send flowers to computer scientist Wil van der Aalst, and Maastricht has risen thanks to behavioural psychologist Gerjo Kok. The fact that individual researchers can have such a strong influence on the position of a university in this ranking may trigger all sorts of perverse behaviour, such as trying to lure staff away from a competing university.

%d bloggers like this: