Abstract
Systematic literature reviews (SLR) are commonly undertaken by researchers to stay informed of the latest development in a particular topic, but this manual process is demanding and can only locate and analyze a limited number of articles. We propose a data analytic-based SLR protocol and a set of semi-automated tools to leverage the latest advances in data analytics and facilitate a more effective, objective, and comprehensive SLR process. Our protocol incorporates scraping tools to collect articles from seven bibliographic databases, and text analytics, social network analysis, natural language processing, citation analysis, and main path analysis to analyze a large number of articles. To demonstrate its utility of, we apply the protocol on the topic of “information diffusion in social networks”. The results reveal 11 latent topics under this broad domain along with the most critical articles for each topic, and the connections among the associated 1,229 articles and their references.
Similar content being viewed by others
Notes
Our protocol can also be used on other free or subscriber-only academic databases (e.g., Web of Science and Scopus).
As of June 4, 2021, Microsoft has announced that Microsoft Academic Service will be discontinued at the end of 2021 – see https://www.microsoft.com/en-us/research/project/academic/articles/microsoft-academic-to-expand-horizons-with-community-driven-approach/, last accessed July 12, 2021.
More details on these techniques can be found at “Refine web searches – Google Search Help” (https://support.google.com/websearch/answer/2466433?hl=en).
To help users identify two types of keywords (author-assigned keywords and LDA-Gensim generated keywords) and obtain the top topics, we separated the analysis into two networks to avoid having a massive and messy network. Combining the two maps together may affect the nodes’ edge degrees, which may impact the results of the top topics. For example, if we combine the two keyword maps, articles that include author-assigned keywords will have higher edge degrees than those that do not include, leading to a biased result.
The nodes are assigned a six-digit HEX color number by cluster (see Appendix – Fig. 7: Color Palette of Nodes).
We increase the edge degree level gradually until the number of nodes is close to 50. For example, in Fig. 5e, when the edge degree is equal to 22 and 23, there are 48 and 53 nodes respectively. We pick the number that is closest to 50.
References
Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: An open source software for exploring and manipulating networks. Proceedings of the International AAAI Conference on Web and Social Media, 3(1), 361–362. https://doi.org/10.1609/icwsm.v3i1.13937
Beel, J., Gipp, B., Langer, S., et al. (2016). Research-paper recommender systems: a literature survey. International Journal on Digital Libraries, 17, 305–338. https://doi.org/10.1007/s00799-015-0156-0
Beydoun, G., Abedin, B., Merigó, J. M., & Vera, M. (2019). Twenty years of information systems frontiers. Information Systems Frontiers, 21(2), 485–494. https://doi.org/10.1007/s10796-019-09925-x
Blei, D. M., Ng, A. Y., Edu, J. B., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research (Vol. 3). Retrieved from http://jmlr.csail.mit.edu/papers/v3/blei03a.html
Boell, S., & Cecez-Kecmanovic, D. (2014). A hermeneutic approach for conducting literature reviews and literature searches. Communications of the Association for Information Systems, 34(1), 12. https://doi.org/10.17705/1CAIS.03412
Boell, S. K., & Cecez-Kecmanovic, D. (2015). Debating systematic literature reviews (SLR) and their ramifications for IS: A rejoinder to Mike Chiasson, Briony Oates, Ulrike Schultze, and Richard Watson. Journal of Information Technology, 30(2), 188–193. https://doi.org/10.1057/jit.2015.15
Chen, W., Wang, Y., & Yang, S. (2009). Efficient influence maximization in social networks. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 199–207. https://doi.org/10.1145/1557019.1557047
Chen, W., Wang, C., & Wang, Y. (2010). Scalable influence maximization for prevalent viral marketing in large-scale social networks. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1029–1038. https://doi.org/10.1145/1835804.1835934
Clark, J., Glasziou, P., Del Mar, C., Bannach-Brown, A., Stehlik, P., & Scott, A. M. (2020). A full systematic review was completed in 2 weeks using automation tools: A case study. Journal of Clinical Epidemiology, 121, 81–90. https://doi.org/10.1016/j.jclinepi.2020.01.008
Delen, D., & Crossland, M. D. (2008). Seeding the survey and analysis of research literature with text mining. Expert Systems with Applications, 34(3), 1707–1720. https://doi.org/10.1016/j.eswa.2007.01.035
Egghe, L., & Rousseau, R. (1990). Introduction to informetrics: Quantitative methods in library, documentation and information science. http://hdl.handle.net/1942/587
Eksa Permana, M., Ramadhan, H., Budi, I., Budi Santoso, A., & Kresna Putra, P. (2020). Sentiment analysis and topic detection of mobile banking application review. 2020 5th International Conference on Informatics and Computing, ICIC 2020. https://doi.org/10.1109/ICIC50835.2020.9288616
Feng, L., Chiam, Y. K., & Lo, S. K. (2018). Text-mining techniques and tools for systematic literature reviews: A systematic literature review. Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 2017-December, 41–50. https://doi.org/10.1109/APSEC.2017.10
Gomez-Rodriguez, M., Leskovec, J., & Krause, A. (2012). Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data, 5(4), 1–37. https://doi.org/10.1145/2086737.2086741
Gomez-Rodriguez, M., Song, L., Du, N., Zha, H., & Schölkopf, B. (2016). Influence estimation and maximization in continuous-time diffusion networks. ACM Transactions on Information Systems, 34(2), 1–33. https://doi.org/10.1145/2824253
Granovetter, M. (1973). The strength of weak ties. Social Networks, 347–367. https://doi.org/10.1016/B978-0-12-442450-0.50025-0
Greene, D., & Cross, J. P. (2017). Exploring the political agenda of the european parliament using a dynamic topic modeling approach. Political Analysis, 25(1), 77–94. https://doi.org/10.1017/pan.2016.7
Gurcan, F., & Cagiltay, N. E. (2019). Big data software engineering: Analysis of knowledge domains and skill sets using LDA-based topic modeling. IEEE Access, 7, 82541–82552. https://doi.org/10.1109/ACCESS.2019.2924075
Han, J., Kamber, M., & Pei, J. (2012). 2 - Getting to know your data. In J. Han, M. Kamber, & J. B. T.-D. M. (Third E. Pei (Eds.), The Morgan Kaufmann Series in Data Management Systems (pp. 39–82). https://doi.org/10.1016/B978-0-12-381479-1.00002-2
Hausberg, J. P., & Korreck, S. (2020). Business incubators and accelerators: A co-citation analysis-based, systematic literature review. Journal of Technology Transfer, 45(1), 151–176. https://doi.org/10.1007/s10961-018-9651-y
Hu, D. (2009). Latent Dirichlet Allocation for text, images, and music. San Diego: University of California. Retrieved April, 26: 2013, 2009.
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. In Multimedia tools and applications (Vol. 78). https://doi.org/10.1007/s11042-018-6894-4
Jonnalagadda, S. R., Goyal, P., & Huffman, M. D. (2015). Automating data extraction in systematic reviews: A systematic review. Systematic Reviews, 4(1). https://doi.org/10.1186/s13643-015-0066-7
Kempe, D., Kleinberg, J., & Tardos, É. (2003). Maximizing the spread of influence through a social network. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 137–146. https://doi.org/10.1145/956750.956769
Kitchenham, B. (2004). Procedures for performing systematic reviews. Keele, UK, Keele University, 33(2004), 1–26.
Kitchenham, B., Pearl Brereton, O., Budgen, D., Turner, M., Bailey, J., & Linkman, S. (2009). Systematic literature reviews in software engineering - A systematic literature review. Information and Software Technology, 51, 7–15. https://doi.org/10.1016/j.infsof.2008.09.009
Krasnov, F., & Sen, A. (2019). The number of topics optimization: Clustering approach. CEUR Workshop Proceedings, 2478(1), 1–15. https://doi.org/10.3390/make1010025
Kumar, S., Saini, M., Goel, M., & Panda, B. S. (2020). Modeling information diffusion in online social networks using a modified forest-fire model. Journal of Intelligent Information Systems, 1–23. https://doi.org/10.1007/s10844-020-00623-8 PM - 33071464
Lee, D., & Seung, H. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788–791. https://doi.org/10.1038/44565
Levy, Y., & Ellis, T. J. (2006). A systems approach to conduct an effective literature review in support of information systems research. Informing Science, 9, 181–212. https://doi.org/10.28945/479
Liang, H., Wang, J. J., Xue, Y., & Cui, X. (2016). IT outsourcing research from 1992 to 2013: A literature review based on main path analysis. Information and Management, 53(2), 227–251. https://doi.org/10.1016/j.im.2015.10.001
Liu, J. S., & Kuan, C. H. (2016). A new approach for main path analysis: Decay in knowledge diffusion. Journal of the Association for Information Science and Technology, 67(2), 465–476. https://doi.org/10.1002/asi.23384
Liu, L., Tang, L., Dong, W., et al. (2016). An overview of topic modeling and its current applications in bioinformatics. SpringerPlus, 5, 1608. https://doi.org/10.1186/s40064-016-3252-8
Liu, X. (2013). Full-text citation analysis: A new method to enhance. Journal of the American Society for Information Science and Technology, 64(July), 1852–1863. https://doi.org/10.1002/asi
Lopes, N., & Ribeiro, B. (2015). Non-Negative Matrix Factorization (NMF). 127–154. https://doi.org/10.1007/978-3-319-06938-8_7
Luo, M., Nie, F., Chang, X., Yang, Y., Hauptmann, A., & Zheng, Q. (2017). Probabilistic non-negative matrix factorization and its robust extensions for topic modeling. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.10832
Marcos-Pablos, S., & García-Peñalvo, F. J. (2020). Information retrieval methodology for aiding scientific database search. Soft Computing, 24(8), 5551–5560. https://doi.org/10.1007/s00500-018-3568-0
Marshall, I. J., & Wallace, B. C. (2019). Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. https://doi.org/10.1186/s13643-019-1074-9
Moro, S., Cortez, P., & Rita, P. (2015). Business intelligence in banking: A literature analysis from 2002 to 2013 using text mining and latent Dirichlet allocation. Expert Systems with Applications, 42(3), 1314–1324. https://doi.org/10.1016/j.eswa.2014.09.024
Nguyen, A. T., Nguyen, T. T., Nguyen, T. N., Lo, D., & Sun, C. (2012). Duplicate bug report detection with a combination of information retrieval and topic modeling. In 2012 27th IEEE/ACM International Conference on Automated Software Engineering, ASE 2012 - Proceedings (pp. 70–79). https://doi.org/10.1145/2351676.2351687
O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M., & Ananiadou, S. (2015). Using text mining for study identification in systematic reviews: A systematic review of current approaches. Systematic Reviews, 4(1), 1–22. https://doi.org/10.1186/2046-4053-4-5/TABLES/3
Olorisade, B. K., De Quincey, E., Andras, P., & Brereton, P. (2016). A critical analysis of studies that address the use of text mining for citation screening in systematic reviews. ACM International Conference Proceeding Series, 01–03-June. https://doi.org/10.1145/2915970.2915982
Paré, G., Tate, M., Johnstone, D., & Kitsiou, S. (2016). Contextualizing the twin concepts of systematicity and transparency in information systems literature reviews. European Journal of Information Systems, 25(6), 493–508. https://doi.org/10.1057/s41303-016-0020-3
Pintas, J. T., Fernandes, L. A. F., & Garcia, A. C. B. (2021). Feature selection methods for text classification: a systematic literature review. In Artificial Intelligence Review (Vol. 54). https://doi.org/10.1007/s10462-021-09970-6
Porter, A. L., Newman, N. C., Suominen, A., Yau, C.-K., Porter, A., & Newman, N. (2014). Clustering scientific documents with topic modeling. Scientometrics, GTM Special Issue, 100(3), 767–786. https://doi.org/10.1007/s11192-014-1321-8
Riemer, K., Niehaves, B., Plattfaut, R., & Südwestfalen, F. (2015). Standing on the shoulders of giants: Challenges and recommendations of literature search in information systems research. https://doi.org/10.17705/1CAIS.03709
Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures. In WSDM 2015 - Proceedings of the 8th ACM International Conference on Web Search and Data Mining (pp. 399–408). https://doi.org/10.1145/2684822.2685324
Rowe, G., & Wright, G. (1999). The Delphi technique as a forecasting tool: issues and analysis. International Journal of Forecasting, 15. https://doi.org/10.1016/S0169-2070(99)00018-7. Accessed Oct 1999.
Schultze, U. (2015). Skirting SLR’s language trap: Reframing the “systematic” vs “traditional” literature review opposition as a continuum. Journal of Information Technology, 30(2), 180–184. https://doi.org/10.1057/jit.2015.10
Sundaram, G., & Berleant, D. (2023). Automating systematic literature reviews with natural language processing and text mining: A systematic literature review. In X. S. Yang, R. S. Sherratt, N. Dey, & A. Joshi (Eds.), Proceedings of Eighth International Congress on Information and Communication Technology. Springer, Singapore: ICICT 2023. Lecture Notes in Networks and Systems. https://doi.org/10.1007/978-981-99-3243-6_7
Tshitoyan, V., Dagdelen, J., Weston, L., Dunn, A., Rong, Z., Kononova, O., …, Jain, A. (2019). Unsupervised word embeddings capture latent knowledge from materials science literature. Nature, 571(7763), 95–98. https://doi.org/10.1038/s41586-019-1335-8
van Dinter, R., Tekinerdogan, B., & Catal, C. (2021). Automation of systematic literature reviews: A systematic literature review. Information and Software Technology, 136(October 2020), 106589. https://doi.org/10.1016/j.infsof.2021.106589
Varghese, A., Cawley, M., & Hong, T. (2018). Supervised clustering for automated document classification and prioritization: A case study using toxicological abstracts. Environment Systems and Decisions, 38(3), 398–414. https://doi.org/10.1007/s10669-017-9670-5
Watson, R. T. (2015). Beyond being systematic in literature reviews in IS. Journal of Information Technology, 30(2), 185–187. https://doi.org/10.1057/jit.2015.12
Weber, R. (2012). Theory building in the information systems discipline: Some critical reflections. In D. N. Hart & S. D. Gregor (Eds.), Information Systems Foundations: Theory Building in Information Systems (pp. 1–20). ANU Press. http://www.jstor.org/stable/j.ctt24h30p.6
Webster, J., & Watson, R. T. (2002). Analyzing the past to prepare for the future: writing a literature review. MIS Quarterly, 26(2), xiii–xxiii. http://www.jstor.org/stable/4132319
Weißer, T., Saßmannshausen, T., Ohrndorf, D., Burggräf, P., & Wagner, J. (2020). A clustering approach for topic filtering within systematic literature reviews. MethodsX, 7. https://doi.org/10.1016/j.mex.2020.100831
Zhao, W., Chen, J. J., Perkins, R., Liu, Z., Ge, W., Ding, Y., & Zou, W. (2015). A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinformatics, 16(13), S8. https://doi.org/10.1186/1471-2105-16-S13-S8
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of Interest
The authors declare that there is no conflict of interest and have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. No funding was received to assist with the preparation of this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xiong, R.R., Liu, C.Z. & Choo, KK.R. Synthesizing Knowledge through A Data Analytics-Based Systematic Literature Review Protocol. Inf Syst Front (2023). https://doi.org/10.1007/s10796-023-10432-3
Accepted:
Published:
DOI: https://doi.org/10.1007/s10796-023-10432-3