Empirical comparative study of similarity indexes in scientometrics co-authorship analysis
Main Article Content
Abstract
Similarity indexes are widely used in the field of scientometrics either in co-words, co-citations, bibliographic coupling, or co-authorship, and very recently in link prediction and system recommender. Despite the rich literature on the comparison of various indexes very rarely a consensus is being reached on the appropriateness of a specific one. This paper aims to enhance empirical understanding of similarity indexes within the context of co-authorship networks, which are widely used and highly relevant in scientometrics. The objective is to assist scientometricians in better analyzing co-authorship networks and selecting the most suitable similarity index for their studies. The research examines two types of co-authorship networks - one with low density at the individual level and another with high density at the country level - using five commonly applied similarity indexes: Jaccard, Salton, Dice-Sorenson, Pearson, and Association Strength. The study confirms that, as theoretically expected, the Salton index follows a concave increasing function of the Jaccard index, with Jaccard values consistently lower, regardless of network density. The concave shape of the curve is more pronounced in the case of low dense network. A linear function is found between Dice-Sorenson and Salton. Additionally, Pearson is observed to be 'orthogonal' to Jaccard, Salton, and Dice-Sorenson, indicating a lack of direct correlation. In contrast, Association Strength behaves differently: in a high-density network, it is 'orthogonal' to Jaccard, Salton, and Dice-Sorenson and shows no correlation with Pearson. However, in a low-density network, Association Strength displays the opposite behavior.
Downloads
Article Details
It is a condition of publication that manuscripts submitted to the journal have not been published, accepted for publication, nor simultaneously submitted for publication elsewhere. By submitting a manuscript, the author(s) agree that copyright for the article is transferred to the publisher, if and when the manuscript is accepted for publication.
References
Abrizah, A., Erfanmanesh, M., Rohani, V. A., Thelwall, M., Levitt, J. M., & Didegah, F. (2014). Sixty-four years of informetrics research: productivity, impact and collaboration. Scientometrics, 101(1), 569-585.
Adnani, H., Cherraj, M. & Bouabid, H. (2020). Similarity indexes for scientometric research: a comparative analysis. Malaysian Journal of Library & Information Science, 25(3), 29-46.
Ahlgren, P., Jarneving, B., & Rousseau, R. (2003). Requirements for a cocitation similarity measure, with special reference to Pearson's correlation coefficient. Journal of the American Society for Information Science and Technology, 54(6), 550-560.
Balland, P.A., Boschma, R., & Ravet, J. (2019). Network dynamics in collaborative research in the EU, 2003-2017. European Planning Studies, 27(9), 1811-1837.
Bensman, S. J. (2004). Pearson’s r and author co-citation analysis: a commentary on the controversy. Journal of the American Society for Information Science and Technology, 55(10), 935.
Bouabid, H., & Achachi, H. (2022). Size of science team at university and internal co-publications: Science policy implications. Scientometrics, 127(12), 6993-7013.
Chuan, P. M., Son, L. H., Ali, M, Khang, T. D., Huong, L. T., & Dey, N. (2018). Link prediction in co-authorship networks based on hybrid content similarity metric. Applied Intelligence, 48(8), 2470-2486.
Egghe, L. (2009). New relations between similarity measures for vectors based on vector norms. Journal of the American Society for Information Science and Technology, 60(2), 232-239.
Egghe, L., & Leydesdorff, L. (2009). Relation between Pearson's correlation coefficient r and Salton's cosine measure. Journal of the American Society for Information Science and Technology, 60(5), 1027-1036.
Egghe, L. (2010a). On the relation between the Association Strength and other similarity measures. Journal of the American Society for Information Science and Technology, 61(7), 1502-1504.
Egghe, L. (2010b). Good properties of similarity measures and their complementarity. Journal of the American Society for Information Science and Technology, 61(10), 2151-2160.
Erfanmanesh, M., Rohani, V. A., & Abrizah, A. (2012). Co-authorship network of scientometrics research collaboration. Malaysian Journal of Library and Information Science, 17(3), 73-93.
Hamers, L., Hemeryck, Y., Herweyers, G., Janssen, M., Keters, H., Rousseau, R., & Vanhoutte, A. (1989). Similarity measures in scientometric research: the Jaccard index versus Salton's cosine formula. Information Processing & Management, 25(3), 315-318.
Lü, L., Medo, M., Yeung, C. H., Zhang, Y. C., Zhang, Z. K., & Zhou, T. (2012). Recommender systems. Physics Reports, 519(1), 1-49.
Lü, L., & Zhou, T. (2011). Link prediction in complex networks: a survey. Physica A: Statistical Mechanics and its Applications, 390(6), 1150-1170.
Luukkonen, T., Tijssen, R. J., Persson, O., & Sivertsen, G. (1993). The measurement of international scientific collaboration. Scientometrics, 28(1), 15-36.
Pearson, K. (1895). Notes on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London, 58, 240-242.
Schneider, J. W., & Borlund, P. (2007). Matrix comparison, Part 2: Measuring the resemblance between proximity measures or ordination results by use of the mantel and procrustes statistics. Journal of the American Society for Information Science and Technology, 58(11), 1596-1609.
Sternitzke, C., & Bergmann, I. (2009). Similarity measures for document mapping: A comparative study on the level of an individual scientist. Scientometrics, 78(1), 113-130.
Van Eck, N. J. & Waltman, L. (2009). How to normalize cooccurrence data? An analysis of some well-known similarity measures. Journal of The American Society for Information Science and Technology, 60(8), 1635-1651.
Watts D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small-world’ networks. Nature, 393, 440-442.
Wagner, C. S., & Leydesdorff, L. (2003). Mapping global science using international co-authorships: a comparison of 1990 and 2000. In G. Jiang, R. Rousseau, & Y. Wu (Eds.), Proceedings of the Ninth International Conference on Scientometrics and Informetrics - ISSI 2003, (pp. 330-340).
Wei, F., Zhang, G., Feng, Y., Liu, L., & Shao, Z. (2017). A co-authorship network-based method for understanding the evolution of a research area: A case of information systems research. Malaysian Journal of Library & Information Science, 22(2), 1-14.
White, H. D. (2003). Author cocitation analysis and Pearson’s r. Journal of the American Society for Information Science and Technology, 54(13), 1250-1259.
White, H. D. (2004). Replies and a correction. Journal of the American Society for Information Science and Technology, 55(9), 843-844.