Search toggle

Evaluative bibliometrics

Bibliometrics is often used to evaluate research. In Sweden, resources to universities will partly be distributed based on research seen through the number of published articles and citations.


Stefan Carlstein
036-10 10 15

Similar systems are already in operation in several other countries. In Norway, for instance, bibliometrics has been used for this purpose during a number of years. Furthermore, bibliometric evaluation has a clear science policy dimension as its inherent incentive structure in certain aspects aims at steering publishing efforts towards certain publication types and channels.

Below is an introduction to research evaluation based on number of citations and number of publications (the Swedish model) and evaluation based on publication type and the "scholarly prestige" to which a given publication channel is ascribed (the Norwegian result-based redistribution model).

Citation analysis

Citations have a long tradition of forming the basis for the evaluation of published research output. Theoretical arguments in favour of citation analysis are that researchers usually cites publications that form the basis of their own research. From this viewpoint, citations would indicate which sources have made their contributions to the research front and thus are indicators of impact. Empirical studies tend to confirm such a hypothesis (for an overview, see e.g. Bornmann & Daniel, 2008). However, citation analyses applied for evaluation purposes are controversial for, at least, three reasons: (1) Impact is not equivalent with quality, but rather a measurable aspect of quality. (2) From a social constructivist angle, the number of citations have rather social, rethorical or even political causes, than they are signs of contributions to contemporary research and/or its cognitive content. (3) Today, citation analyses for evaluation purposes can in principal only be a valid method within research areas where the research output is mainly published in international scholarly journals which are indexed in the database or databases that are used as data sources for the analysis (in practice, this usually means the citation indices supplied by Thomson Reuters). However, these requirements are not met for the humanities and certain areas of the social sciences (see further Moed, 2005, chapters 7-8). Described below are the basics of applied citation analysis as well as how the effects of differences in publishing practice within different areas can be reduced in order to facilitate comparisons between those areas. 


Generally, it is not particularly informative to report solely the number of citations, or the average citation frequency, of a particular unit such as a researcher or institution. The number of citations should be related to some reference group so that statements regarding the citation frequency of a unit (researcher or institution) are made in relative terms. Furthermore, the publications in the reference group must be written within the same subject area, and also be of the same publication type and published the same year. There are mainly three methods used to define such reference groups (Schubert & Braun 1996):

  1. Journal normalization - the publications in the journals wherein the publications of the analyzed unit are found.
  2. Field normalization - the publications within the subject fields wherein the publications of the analyzed unit are published. These are usually operationalized by the number of journals within the approximately 250 journal disciplines as defined in the citation databases supplied by Thomson Reuters.
  3. Ad hoc normalization — a reference group is created by some criterion that intends to form a number of subject related publications. The criterion can, for instance, be based on existing bibliographic classification systems (such as MeSH within medicine or PACS within physics) or from bibliographic coupling, that is, publications related to each other by mutually shared references.

These three methods have each their pros and cons. If (1) is used, two analyzed units within the same subject field may get similar normalized value despite the fact that the number of citations per publication is significantly larger for one of the units. This may occur when the publications of the first unit, on the whole, are published in journals with a lower average number of citations.  
Therefore, the most common way today is to use, or complement, journal normalization with method (2). However, some of the categories of journals, and the subject areas pertained to these, can be too heterogeneous to describe an area with similar publishing and citation structure. The last method used (3) to create a comparison group is not applicable in most cases since robust classification systems only exist for a few subject areas, and should any of them be used in a particular study it might be hard to replicate the outcome. Methods (1) and (2) should be regarded as the international standard.

The two graphs below show the importance of taking into consideration the subject area and publishing year, and the need to regard the publication type. On the x-axis you see the publishing year and on the y-axis the average number of citations as measured in 2009:

The graph above (data from Essential Science Indicators) clearly shows that the expected average citation frequency for biology and biochemistry is considerably higher than for computer science. Therefore, you can not make a comparison between these fields without first having performed some form of normalization as described above. Furthermore, it is not reasonable to compare publications within a given field with different publishing year as it is obvious that citations are accumulated by time. The graph below shows the need for considering the type of publication when normalization is made:

The example above applies to the area "signal processing" (as defined by SciMago and Science Citation Index Expanded). Besides the effect of publishing year it is also clear that the expected citation frequency for a given year is higher for review articles than for orginial articles, and the same condition is valid for the differences between orginal articles and letters.
Apart from the composition of the normalization group, the size of the investigated unit is an important aspect in an analysis. Examples of aggregation levels are given below:

Macro level

  • Geopolitical regions
  • Countries
  • Broad subject areas and sub-divisions of them
  • Specific themes

Meso level

  • Universities
  • Research institutions
  • Journals
  • Research groups

Micro level

  • Smaller research groups
  • Individuals

The analyzed units are presented on a decreasing scale regarding the size of the population and confidence level regarding the observed value for the citation based bibliometric indicator used. The certainty or validity connected to the indicators used for calculation on macro level is usually greater than for meso or micro level. This is, among other things, due to the "noise" of the random distribution of highly cited articles as well as reliability problems inherent in the data collection process. A common rule of thumb, albeit an arbitrary one, asserts that an analysis have to include at least 50 publications so that the margin of error will not make a meaningful interpretation impossible. As has been stated earlier, a citation to a publication indicates scientific impact and the aggregated number of citations (normalized) to the publications of an analyzed unit is often used as a rough approximation in order to evaluate the quality of the publications. This assumption has proven to be satisfyingly correct on macro and meso levels (e.g. it correlates well with peer review). But the same assumption may pose a problem on micro level as, among other things, randomness and the fact that citations to publications may be negative, that is, a publication is cited because its results are questioned, and therefore have a greater influence on the analysed unit. When analyzing larger material those factors play a lesser role since they tend to cancel each other out.


So far, we have only discussed citation based bibliomteric methods. One interesting variable in evaluation studies is the number of publications that an analyzed unit has published during a given time interval. Such indicators intend to measure productivity rather than quality or influence. For one and the same analyzed unit, let's say an institution, a time series, for instance, can be created to study the scientific production over time. If comparisons are made between analyzed units within the same field, a normalization of its size should be made, for example by using person-years. A somewhat controversial question in this context concerns the definition of publication. What should be considered a scientific one and which types should form the basis for a calculation of scientific productivity? Just as citation frequency is normalized regarding subject areas, one must take into consideration the fact that different fields do not have an equal distribution of their publication types. In the Swedish model only refereed articles in international journals, or rather a subset of them, i.e. those that are indexed in the citation indices provided by Thomson Reuters, are observed. The graph below shows an estimation of how many articles a Nordic researcher within different research areas is producing within a time-period of four years.

On the basis of data presented above, it is obvious that if only one type of publication is regarded in the calculation of productivity, consideration must be taken to the fact that the publishing practice differs greatly between respective fields. Only certain fields have international journals as their main publishing channel, whereas other fields preferably publish their research in, for example, anthologies, books, conference papers, reports or non-English journals.

Read further here about the arguments behind the Swedish model and how normalized citation rate is calculated, and how this aspect of published research is combined with productivity calculations that are adjusted with respect to subject area.

An alternative model

An alternative to citation analyses of journal articles is found in Norway where a bibliometric model is used for research evaluation and distribution of means (Vekt på forskning 2004) that is very different from the Swedish.
In this context, it might be noted that Stockholm University uses the Norwegian model on an institutional level, and a variant of the Norwegian model is also in preparation in Denmark. Just as in Sweden, attempts are made to combine productivity and impact measures. However, impact is not operationalized with citations, and more publication types than articles in international journals form the basis for the calculation of productivity. As a starting point, a strict definition of the term 'publication' is used and impact is calculated on the basis of the scientific level attributed to the publishing channel (established by subject experts) in which a given journal is found. The channels that are used in the model are publishing houses, journals, series, and web sites. A publication in an included channel must, in order to be accredited, comply with the standard definition of a scientific publication, which includes, among other things, original research and that the channel in question has peer review routines.

Read further here about the Norwegian model, how publishing channels are sub-divided into levels, which publication types are valid ones, and how the used definition of impact and productivity is combined to allot publication points to an analyzed unit.

Thus, the Swedish and Norwegian models are fairly different. The Swedish model uses citations to measure impact and attention within the research community and only considers articles in journals indexed by Thomson Reuters. The Norwegian model attempts to measure impact on research by publishing channels, defined and weighted in advance. The relative strength of the Norwegian model is perhaps that it is not connected to a specific database and accepts more publication types than the Swedish model. The relative weakness is that impact is defined beforehand, that is, irrespective of how the publications are received by other researchers, with regard to citations, for instance. What the two models have in common, though, is that the incitament structure rewards research which competes on the international scene.

Content updated 2017-01-10

We use cookies on By continuing to use this site you accept the use of cookies. More information