The proposed Swedish bibliometric model attempts to estimate the universities' performance regarding scholarly publications in international journals.

Stefan Carlstein

stefan.carlstein@ju.se

036-10 10 15

This is made by combining normalized article production with normalized citation rate. In this way, each university is attributed a value that intends to reflect its scientific production and impact under a given time period.

The Swedish model attributes each university a numerical value which is used as a distribution key regarding the direct state grants. The system is entirely based on metrics and thus does not use qualitative peer review. The reason for this claims to be that a system which is lean on resources is desirable, which should be understood as a system that does not burden the research community with the collection of various information. Used as data sources are the citation indices SCI-E, SSCI and A&HCI, provided by Thomson Reuters. Only articles indexed in these indices form the basis for the calculation of the distribution key.

The first component of the system is the normalized citation rate and intends to estimate research impact. The normalization, which is done by using the approximately 250 journal classes in the given citation indices, is made in order for it to be possible to compare a journal article with similar articles on an international level, and so that comparisons between fields will be possible. The normalization is performed on article level and fractionalization is used, which means that if an article is a result of a collaboration between *n* universities, it is alloted 1/n of the article. When calculating the citation rate the university articles are weighted by their fractionalized value. Articles with few collaborators thus weights more than the ones with many collaborators and the intention of this is to give a better representation of the work effort. The calculation formula is:

where Ci denotes the number of publications to publication i and [µf]i is the average number of citations to articles of the same type (for example original articles or review articles), published the same year and in the same journal class or classes as article i. The fraction of publication i which is ascribed a given university is denoted [Pfrac]i.

The other component is the normalized article production. Normalization is necessary, as the coverage of the citation databases used in each case is not equal between different fields. As has been stated before, one has to be aware of the fact that different publishing traditions exist where in one field research is published mainly in international journals whereas in other fields other publishing channels are used. Looking at the information provided by Thomson, chemists, for instance, seems to be more productive than humanists, which is hardly the case in reality (although, a chemist does publish more articles in international journals).

Normalized article production intends to estimate what the number of articles corresponds to in terms of the normal production of a researcher in the field after having taken into consideration the different publishing traditions of the field. The journal classes used when calculations of the normalized citation rate can not be used to normalize the article production since stable reference values can not be obtained due to the fact that some of the classes consist of too few articles for the following method to work satisfyingly. Instead, cluster analysis is used which assigns each journal one of 34 clusters. These 34 clusters, or subject classes, are intended to represent broad research fields with similar work and publishing conditions. For the clusters, reference values are then calculated. In order to create the reference values, the distribution of articles per author in each field is assumed to be a Waring distribution. By analyzing this distribution (the article publishing frequency of active Nordic authors during a given time period) an estimation can be made of how many articles a researcher with a normal production (by Nordic standards) in each field publishes.

The method is best understood by using an example. Let us pose that a production distribution is generated from the probabilty function of the Waring distribution (using the parameters ? = 4 N =5) for a fictitious field.

At this point, we know the number of authors who haven't published anything during the given time period. The author population consists of 5,000 researchers and the number of researchers with no articles during the time period is 2,238. A researcher produces, on average, 1.66 articles. Usually, we do not actually have any information on the number of researchers with no articles during the given time period, so therefore, this data has been deleted from this fictitious example. Still, the following calculations are meant to show how we, based on the characteristics of a Waring distribution, can estimate how many articles a researcher, on average, produces in the given field, as well as estimate the size of the total author population in our fictitious research area.

The graph below shows the distribution of the data described above (where researchers with no published articles are not included). On the x-axis we see the number of articles and on the y-axis the number of authors.

Now, in order to estimate how many articles a researcher, on average, publishes in the given field, we calculate the point where the regression line for the plotted, s-truncated average population value crosses the y-axis (first the average is calculated for all the authors with one or several articles, then for authors with more than one article and so on). The number of articles form the x-axis and the different averages the y-axis. A linear function is produced by a weighted least square regression (in the example the weights are proportionally inverted to the standard error for each truncated average value). The graph below shows is a result of the described approach.

The point of intersection on the y-axis is equal to 1.68. Our estimation of how many articles an average researcher in our field produces on average corresponds very well with the real value (which we get when we use a population of 5,000 authors). If the reference value is calculated in the same way for all fields a just comparison between them can be done. For example, a chemistry institution's production of 56 articles could be approximatively equal (in terms of what the number of articles corresponds to in the form of a researcher with an average production of articles) to a humanities institution that has produced 4 articles during the same time period, given that the reference values are 2.22 and 0.16 respectively (the Nordic reference values according HSV 2008:18R). Incidentally, we could also estimate the total number of researchers in our fictitious field by dividing the number of articles (which, in this case is 8,318) with the number of articles produced by the average researcher: 8318/1.68 = 4,951, which is a very good estimation of the actual number of 5,000. When using this in a real case, empirical data may lead us to question the reasonability of the distribution assumption, and we can hardly hope for such exact estimations as the ones we produced in this fictitious example.

Now we have reference values for the number of received citations in different fields and reference values for the number of published articles in different fields. For a given university the normalized productivity per macro category is calculated by identifying how the university's articles are distributed across the 34 categories and divide each number with the corresponding reference values. We then have a normalized value which, to a great extent, should reflect all the university's resources - direct state funds and external research grants - in that increased resources result in more staff, who in turn produce articles. It is also a value that can be interpreted as a value of how many researchers, on average, to which the number of articles correspond. Before the normalized productivity values for each field are added up, they are multiplied with the corresponding normalized citation rate (remember that this is a quotient equal to 1 if the articles, on average, are cited as much as the world average). After the summation, each university has received a value which is a combination of scientific productivity and scientific impact. Together with an indicator for received external grants, this value constitute the basis for the allocation of direct funds.

Now, an important observation is that if a given university can turn its publishing - specifically in such fields as the humanities, education, and the social sciences - to an increased share of articles in international journals, a greater share of the allocated funds will be attained. That is, providing that the normalized citation rate is not decreasing considerably. One exception exists though concerning the humanities where only normalized article production is counted (the normalized citation rate is 1, i.e. equal to the world average), as citation data for the humanities are too volatile. One could even say that humanists actually find themselves in a relatively advantageous situation since they have quite a low share of articles in the given database and a moderate increase should have great impact.

- Braun, T., Glanzel, W., & Schubert, A. (1990). Publication productivity. from frequency distributions to scientometric indicators. Journal of Information Science, 16(1), 37-44.
- Resursutredningen (2007). Resurser för kvalitet: slutbetänkande. Stockholm: Fritze (in Swedish only)
- Sandström, Ulf & Sandström, Erik (2008). Resurser för citeringar. Stockholm: Högskoleverket (in Swedish only)
- Utbildningsdepartementet. (2009). Uppdrag till Vetenskapsrådet att redovisa underlag för indikatorn vetenskaplig produktion och citering m.m. (U2009/322/F). (in Swedish only)

Content updated 2010-11-08