Synthesizing Knowledge through A Data Analytics-Based Systematic Literature Review Protocol

Xiong, Rachael Ruizhu; Liu, Charles Zhechao; Choo, Kim-Kwang Raymond

doi:10.1007/s10796-023-10432-3

Synthesizing Knowledge through A Data Analytics-Based Systematic Literature Review Protocol

Published: 09 October 2023

(2023)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

Rachael Ruizhu Xiong¹,
Charles Zhechao Liu² &
Kim-Kwang Raymond Choo²

528 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Systematic literature reviews (SLR) are commonly undertaken by researchers to stay informed of the latest development in a particular topic, but this manual process is demanding and can only locate and analyze a limited number of articles. We propose a data analytic-based SLR protocol and a set of semi-automated tools to leverage the latest advances in data analytics and facilitate a more effective, objective, and comprehensive SLR process. Our protocol incorporates scraping tools to collect articles from seven bibliographic databases, and text analytics, social network analysis, natural language processing, citation analysis, and main path analysis to analyze a large number of articles. To demonstrate its utility of, we apply the protocol on the topic of “information diffusion in social networks”. The results reveal 11 latent topics under this broad domain along with the most critical articles for each topic, and the connections among the associated 1,229 articles and their references.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Searching Systematically and Comprehensively

How to Operate Literature Review Through Qualitative and Quantitative Analysis Integration?

Literature Reviews: An Overview of Systematic, Integrated, and Scoping Reviews

Notes

Our protocol can also be used on other free or subscriber-only academic databases (e.g., Web of Science and Scopus).
As of June 4, 2021, Microsoft has announced that Microsoft Academic Service will be discontinued at the end of 2021 – see https://www.microsoft.com/en-us/research/project/academic/articles/microsoft-academic-to-expand-horizons-with-community-driven-approach/, last accessed July 12, 2021.
More details on these techniques can be found at “Refine web searches – Google Search Help” (https://support.google.com/websearch/answer/2466433?hl=en).
To help users identify two types of keywords (author-assigned keywords and LDA-Gensim generated keywords) and obtain the top topics, we separated the analysis into two networks to avoid having a massive and messy network. Combining the two maps together may affect the nodes’ edge degrees, which may impact the results of the top topics. For example, if we combine the two keyword maps, articles that include author-assigned keywords will have higher edge degrees than those that do not include, leading to a biased result.
The nodes are assigned a six-digit HEX color number by cluster (see Appendix – Fig. 7: Color Palette of Nodes).
We increase the edge degree level gradually until the number of nodes is close to 50. For example, in Fig. 5e, when the edge degree is equal to 22 and 23, there are 48 and 53 nodes respectively. We pick the number that is closest to 50.

References

Bastian, M., Heymann, S., & Jacomy, M. (2009). Gephi: An open source software for exploring and manipulating networks. Proceedings of the International AAAI Conference on Web and Social Media, 3(1), 361–362. https://doi.org/10.1609/icwsm.v3i1.13937
Article Google Scholar
Beel, J., Gipp, B., Langer, S., et al. (2016). Research-paper recommender systems: a literature survey. International Journal on Digital Libraries, 17, 305–338. https://doi.org/10.1007/s00799-015-0156-0
Article Google Scholar
Beydoun, G., Abedin, B., Merigó, J. M., & Vera, M. (2019). Twenty years of information systems frontiers. Information Systems Frontiers, 21(2), 485–494. https://doi.org/10.1007/s10796-019-09925-x
Article Google Scholar
Blei, D. M., Ng, A. Y., Edu, J. B., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research (Vol. 3). Retrieved from http://jmlr.csail.mit.edu/papers/v3/blei03a.html
Boell, S., & Cecez-Kecmanovic, D. (2014). A hermeneutic approach for conducting literature reviews and literature searches. Communications of the Association for Information Systems, 34(1), 12. https://doi.org/10.17705/1CAIS.03412
Article Google Scholar
Boell, S. K., & Cecez-Kecmanovic, D. (2015). Debating systematic literature reviews (SLR) and their ramifications for IS: A rejoinder to Mike Chiasson, Briony Oates, Ulrike Schultze, and Richard Watson. Journal of Information Technology, 30(2), 188–193. https://doi.org/10.1057/jit.2015.15
Article Google Scholar
Chen, W., Wang, Y., & Yang, S. (2009). Efficient influence maximization in social networks. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 199–207. https://doi.org/10.1145/1557019.1557047
Chen, W., Wang, C., & Wang, Y. (2010). Scalable influence maximization for prevalent viral marketing in large-scale social networks. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1029–1038. https://doi.org/10.1145/1835804.1835934
Clark, J., Glasziou, P., Del Mar, C., Bannach-Brown, A., Stehlik, P., & Scott, A. M. (2020). A full systematic review was completed in 2 weeks using automation tools: A case study. Journal of Clinical Epidemiology, 121, 81–90. https://doi.org/10.1016/j.jclinepi.2020.01.008
Article Google Scholar
Delen, D., & Crossland, M. D. (2008). Seeding the survey and analysis of research literature with text mining. Expert Systems with Applications, 34(3), 1707–1720. https://doi.org/10.1016/j.eswa.2007.01.035
Article Google Scholar
Egghe, L., & Rousseau, R. (1990). Introduction to informetrics: Quantitative methods in library, documentation and information science. http://hdl.handle.net/1942/587
Eksa Permana, M., Ramadhan, H., Budi, I., Budi Santoso, A., & Kresna Putra, P. (2020). Sentiment analysis and topic detection of mobile banking application review. 2020 5th International Conference on Informatics and Computing, ICIC 2020. https://doi.org/10.1109/ICIC50835.2020.9288616
Feng, L., Chiam, Y. K., & Lo, S. K. (2018). Text-mining techniques and tools for systematic literature reviews: A systematic literature review. Proceedings - Asia-Pacific Software Engineering Conference, APSEC, 2017-December, 41–50. https://doi.org/10.1109/APSEC.2017.10
Gomez-Rodriguez, M., Leskovec, J., & Krause, A. (2012). Inferring networks of diffusion and influence. ACM Transactions on Knowledge Discovery from Data, 5(4), 1–37. https://doi.org/10.1145/2086737.2086741
Gomez-Rodriguez, M., Song, L., Du, N., Zha, H., & Schölkopf, B. (2016). Influence estimation and maximization in continuous-time diffusion networks. ACM Transactions on Information Systems, 34(2), 1–33. https://doi.org/10.1145/2824253
Granovetter, M. (1973). The strength of weak ties. Social Networks, 347–367. https://doi.org/10.1016/B978-0-12-442450-0.50025-0
Greene, D., & Cross, J. P. (2017). Exploring the political agenda of the european parliament using a dynamic topic modeling approach. Political Analysis, 25(1), 77–94. https://doi.org/10.1017/pan.2016.7
Article Google Scholar
Gurcan, F., & Cagiltay, N. E. (2019). Big data software engineering: Analysis of knowledge domains and skill sets using LDA-based topic modeling. IEEE Access, 7, 82541–82552. https://doi.org/10.1109/ACCESS.2019.2924075
Article Google Scholar
Han, J., Kamber, M., & Pei, J. (2012). 2 - Getting to know your data. In J. Han, M. Kamber, & J. B. T.-D. M. (Third E. Pei (Eds.), The Morgan Kaufmann Series in Data Management Systems (pp. 39–82). https://doi.org/10.1016/B978-0-12-381479-1.00002-2
Hausberg, J. P., & Korreck, S. (2020). Business incubators and accelerators: A co-citation analysis-based, systematic literature review. Journal of Technology Transfer, 45(1), 151–176. https://doi.org/10.1007/s10961-018-9651-y
Article Google Scholar
Hu, D. (2009). Latent Dirichlet Allocation for text, images, and music. San Diego: University of California. Retrieved April, 26: 2013, 2009.
Google Scholar
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y., & Zhao, L. (2019). Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey. In Multimedia tools and applications (Vol. 78). https://doi.org/10.1007/s11042-018-6894-4
Jonnalagadda, S. R., Goyal, P., & Huffman, M. D. (2015). Automating data extraction in systematic reviews: A systematic review. Systematic Reviews, 4(1). https://doi.org/10.1186/s13643-015-0066-7
Kempe, D., Kleinberg, J., & Tardos, É. (2003). Maximizing the spread of influence through a social network. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 137–146. https://doi.org/10.1145/956750.956769
Kitchenham, B. (2004). Procedures for performing systematic reviews. Keele, UK, Keele University, 33(2004), 1–26.
Google Scholar
Kitchenham, B., Pearl Brereton, O., Budgen, D., Turner, M., Bailey, J., & Linkman, S. (2009). Systematic literature reviews in software engineering - A systematic literature review. Information and Software Technology, 51, 7–15. https://doi.org/10.1016/j.infsof.2008.09.009
Article Google Scholar
Krasnov, F., & Sen, A. (2019). The number of topics optimization: Clustering approach. CEUR Workshop Proceedings, 2478(1), 1–15. https://doi.org/10.3390/make1010025
Article Google Scholar
Kumar, S., Saini, M., Goel, M., & Panda, B. S. (2020). Modeling information diffusion in online social networks using a modified forest-fire model. Journal of Intelligent Information Systems, 1–23. https://doi.org/10.1007/s10844-020-00623-8 PM - 33071464
Lee, D., & Seung, H. (1999). Learning the parts of objects by non-negative matrix factorization. Nature, 401, 788–791. https://doi.org/10.1038/44565
Article Google Scholar
Levy, Y., & Ellis, T. J. (2006). A systems approach to conduct an effective literature review in support of information systems research. Informing Science, 9, 181–212. https://doi.org/10.28945/479
Article Google Scholar
Liang, H., Wang, J. J., Xue, Y., & Cui, X. (2016). IT outsourcing research from 1992 to 2013: A literature review based on main path analysis. Information and Management, 53(2), 227–251. https://doi.org/10.1016/j.im.2015.10.001
Article Google Scholar
Liu, J. S., & Kuan, C. H. (2016). A new approach for main path analysis: Decay in knowledge diffusion. Journal of the Association for Information Science and Technology, 67(2), 465–476. https://doi.org/10.1002/asi.23384
Article Google Scholar
Liu, L., Tang, L., Dong, W., et al. (2016). An overview of topic modeling and its current applications in bioinformatics. SpringerPlus, 5, 1608. https://doi.org/10.1186/s40064-016-3252-8
Article Google Scholar
Liu, X. (2013). Full-text citation analysis: A new method to enhance. Journal of the American Society for Information Science and Technology, 64(July), 1852–1863. https://doi.org/10.1002/asi
Article Google Scholar
Lopes, N., & Ribeiro, B. (2015). Non-Negative Matrix Factorization (NMF). 127–154. https://doi.org/10.1007/978-3-319-06938-8_7
Luo, M., Nie, F., Chang, X., Yang, Y., Hauptmann, A., & Zheng, Q. (2017). Probabilistic non-negative matrix factorization and its robust extensions for topic modeling. Proceedings of the AAAI Conference on Artificial Intelligence, 31(1). https://doi.org/10.1609/aaai.v31i1.10832
Marcos-Pablos, S., & García-Peñalvo, F. J. (2020). Information retrieval methodology for aiding scientific database search. Soft Computing, 24(8), 5551–5560. https://doi.org/10.1007/s00500-018-3568-0
Article Google Scholar
Marshall, I. J., & Wallace, B. C. (2019). Toward systematic review automation: a practical guide to using machine learning tools in research synthesis. https://doi.org/10.1186/s13643-019-1074-9
Moro, S., Cortez, P., & Rita, P. (2015). Business intelligence in banking: A literature analysis from 2002 to 2013 using text mining and latent Dirichlet allocation. Expert Systems with Applications, 42(3), 1314–1324. https://doi.org/10.1016/j.eswa.2014.09.024
Article Google Scholar
Nguyen, A. T., Nguyen, T. T., Nguyen, T. N., Lo, D., & Sun, C. (2012). Duplicate bug report detection with a combination of information retrieval and topic modeling. In 2012 27th IEEE/ACM International Conference on Automated Software Engineering, ASE 2012 - Proceedings (pp. 70–79). https://doi.org/10.1145/2351676.2351687
O’Mara-Eves, A., Thomas, J., McNaught, J., Miwa, M., & Ananiadou, S. (2015). Using text mining for study identification in systematic reviews: A systematic review of current approaches. Systematic Reviews, 4(1), 1–22. https://doi.org/10.1186/2046-4053-4-5/TABLES/3
Article Google Scholar
Olorisade, B. K., De Quincey, E., Andras, P., & Brereton, P. (2016). A critical analysis of studies that address the use of text mining for citation screening in systematic reviews. ACM International Conference Proceeding Series, 01–03-June. https://doi.org/10.1145/2915970.2915982
Paré, G., Tate, M., Johnstone, D., & Kitsiou, S. (2016). Contextualizing the twin concepts of systematicity and transparency in information systems literature reviews. European Journal of Information Systems, 25(6), 493–508. https://doi.org/10.1057/s41303-016-0020-3
Article Google Scholar
Pintas, J. T., Fernandes, L. A. F., & Garcia, A. C. B. (2021). Feature selection methods for text classification: a systematic literature review. In Artificial Intelligence Review (Vol. 54). https://doi.org/10.1007/s10462-021-09970-6
Porter, A. L., Newman, N. C., Suominen, A., Yau, C.-K., Porter, A., & Newman, N. (2014). Clustering scientific documents with topic modeling. Scientometrics, GTM Special Issue, 100(3), 767–786. https://doi.org/10.1007/s11192-014-1321-8
Article Google Scholar
Riemer, K., Niehaves, B., Plattfaut, R., & Südwestfalen, F. (2015). Standing on the shoulders of giants: Challenges and recommendations of literature search in information systems research. https://doi.org/10.17705/1CAIS.03709
Röder, M., Both, A., & Hinneburg, A. (2015). Exploring the space of topic coherence measures. In WSDM 2015 - Proceedings of the 8th ACM International Conference on Web Search and Data Mining (pp. 399–408). https://doi.org/10.1145/2684822.2685324
Rowe, G., & Wright, G. (1999). The Delphi technique as a forecasting tool: issues and analysis. International Journal of Forecasting, 15. https://doi.org/10.1016/S0169-2070(99)00018-7. Accessed Oct 1999.
Schultze, U. (2015). Skirting SLR’s language trap: Reframing the “systematic” vs “traditional” literature review opposition as a continuum. Journal of Information Technology, 30(2), 180–184. https://doi.org/10.1057/jit.2015.10
Article Google Scholar
Sundaram, G., & Berleant, D. (2023). Automating systematic literature reviews with natural language processing and text mining: A systematic literature review. In X. S. Yang, R. S. Sherratt, N. Dey, & A. Joshi (Eds.), Proceedings of Eighth International Congress on Information and Communication Technology. Springer, Singapore: ICICT 2023. Lecture Notes in Networks and Systems. https://doi.org/10.1007/978-981-99-3243-6_7
Tshitoyan, V., Dagdelen, J., Weston, L., Dunn, A., Rong, Z., Kononova, O., …, Jain, A. (2019). Unsupervised word embeddings capture latent knowledge from materials science literature. Nature, 571(7763), 95–98. https://doi.org/10.1038/s41586-019-1335-8
van Dinter, R., Tekinerdogan, B., & Catal, C. (2021). Automation of systematic literature reviews: A systematic literature review. Information and Software Technology, 136(October 2020), 106589. https://doi.org/10.1016/j.infsof.2021.106589
Varghese, A., Cawley, M., & Hong, T. (2018). Supervised clustering for automated document classification and prioritization: A case study using toxicological abstracts. Environment Systems and Decisions, 38(3), 398–414. https://doi.org/10.1007/s10669-017-9670-5
Article Google Scholar
Watson, R. T. (2015). Beyond being systematic in literature reviews in IS. Journal of Information Technology, 30(2), 185–187. https://doi.org/10.1057/jit.2015.12
Article Google Scholar
Weber, R. (2012). Theory building in the information systems discipline: Some critical reflections. In D. N. Hart & S. D. Gregor (Eds.), Information Systems Foundations: Theory Building in Information Systems (pp. 1–20). ANU Press. http://www.jstor.org/stable/j.ctt24h30p.6
Webster, J., & Watson, R. T. (2002). Analyzing the past to prepare for the future: writing a literature review. MIS Quarterly, 26(2), xiii–xxiii. http://www.jstor.org/stable/4132319
Weißer, T., Saßmannshausen, T., Ohrndorf, D., Burggräf, P., & Wagner, J. (2020). A clustering approach for topic filtering within systematic literature reviews. MethodsX, 7. https://doi.org/10.1016/j.mex.2020.100831
Zhao, W., Chen, J. J., Perkins, R., Liu, Z., Ge, W., Ding, Y., & Zou, W. (2015). A heuristic approach to determine an appropriate number of topics in topic modeling. BMC Bioinformatics, 16(13), S8. https://doi.org/10.1186/1471-2105-16-S13-S8
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Management, College of Business, Kansas State University, Manhattan, KS, USA
Rachael Ruizhu Xiong
Department of Information Systems and Cyber Security, Alvarez College of Business, University of Texas at San Antonio, San Antonio, TX, USA
Charles Zhechao Liu & Kim-Kwang Raymond Choo

Authors

Rachael Ruizhu Xiong
View author publications
You can also search for this author in PubMed Google Scholar
Charles Zhechao Liu
View author publications
You can also search for this author in PubMed Google Scholar
Kim-Kwang Raymond Choo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rachael Ruizhu Xiong.

Ethics declarations

Conflicts of Interest

The authors declare that there is no conflict of interest and have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. No funding was received to assist with the preparation of this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Figure 7

Figure 8

Figure 9

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xiong, R.R., Liu, C.Z. & Choo, KK.R. Synthesizing Knowledge through A Data Analytics-Based Systematic Literature Review Protocol. Inf Syst Front (2023). https://doi.org/10.1007/s10796-023-10432-3

Download citation

Accepted: 02 September 2023
Published: 09 October 2023
DOI: https://doi.org/10.1007/s10796-023-10432-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Synthesizing Knowledge through A Data Analytics-Based Systematic Literature Review Protocol

Abstract

Access this article

Similar content being viewed by others

Searching Systematically and Comprehensively

How to Operate Literature Review Through Qualitative and Quantitative Analysis Integration?

Literature Reviews: An Overview of Systematic, Integrated, and Scoping Reviews

Notes

References