Skip to main content
Log in

Synthesizing Knowledge through A Data Analytics-Based Systematic Literature Review Protocol

  • Published:
Information Systems Frontiers Aims and scope Submit manuscript

Abstract

Systematic literature reviews (SLR) are commonly undertaken by researchers to stay informed of the latest development in a particular topic, but this manual process is demanding and can only locate and analyze a limited number of articles. We propose a data analytic-based SLR protocol and a set of semi-automated tools to leverage the latest advances in data analytics and facilitate a more effective, objective, and comprehensive SLR process. Our protocol incorporates scraping tools to collect articles from seven bibliographic databases, and text analytics, social network analysis, natural language processing, citation analysis, and main path analysis to analyze a large number of articles. To demonstrate its utility of, we apply the protocol on the topic of “information diffusion in social networks”. The results reveal 11 latent topics under this broad domain along with the most critical articles for each topic, and the connections among the associated 1,229 articles and their references.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Our protocol can also be used on other free or subscriber-only academic databases (e.g., Web of Science and Scopus).

  2. As of June 4, 2021, Microsoft has announced that Microsoft Academic Service will be discontinued at the end of 2021 – see https://www.microsoft.com/en-us/research/project/academic/articles/microsoft-academic-to-expand-horizons-with-community-driven-approach/, last accessed July 12, 2021.

  3. More details on these techniques can be found at “Refine web searches – Google Search Help” (https://support.google.com/websearch/answer/2466433?hl=en).

  4. To help users identify two types of keywords (author-assigned keywords and LDA-Gensim generated keywords) and obtain the top topics, we separated the analysis into two networks to avoid having a massive and messy network. Combining the two maps together may affect the nodes’ edge degrees, which may impact the results of the top topics. For example, if we combine the two keyword maps, articles that include author-assigned keywords will have higher edge degrees than those that do not include, leading to a biased result.

  5. The nodes are assigned a six-digit HEX color number by cluster (see Appendix – Fig. 7: Color Palette of Nodes).

  6. We increase the edge degree level gradually until the number of nodes is close to 50. For example, in Fig. 5e, when the edge degree is equal to 22 and 23, there are 48 and 53 nodes respectively. We pick the number that is closest to 50.

References

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rachael Ruizhu Xiong.

Ethics declarations

Conflicts of Interest

The authors declare that there is no conflict of interest and have no affiliations with or involvement in any organization or entity with any financial interest or non-financial interest in the subject matter or materials discussed in this manuscript. No funding was received to assist with the preparation of this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Figure 7

Fig. 7
figure 7

Color palette of nodes. *C stands for clusters; AK stands for author-assigned keywords; LDAK stands for LDA-gensim generated keywords; RF stands for References

Figure 8

Fig. 8
figure 8

LDA-gensim generated keywords networks by top words (clusters). a: LDA keywords network Cluster 1. b: LDA keywords network Cluster 1. c: LDA keywords network Cluster 1 & 3. d: LDA keywords network Cluster 2. e: LDA keywords network Cluster 3. f: LDA keywords network Cluster 4. g: LDA keywords network Cluster 5. h: LDA keywords network Cluster 6. i: LDA keywords network Cluster 7. j: LDA keywords network Cluster 8. k: LDA keywords network Cluster 9. l: LDA keywords network Cluster 10. m: LDA keywords network Cluster 11. n: LDA keywords network Cluster 11. o: LDA keywords network Cluster 11

Figure 9

Fig. 9
figure 9figure 9figure 9figure 9

Citation networks and main path flows by each cluster. a: The most cited paper network (Entire). b: The most citing paper network (Entire). c: Entire Top 50 papers (Edge degree = 47). d: Entire dataset main path. e: Cluster 1 top 48 papers (Edge degree = 23). f: Cluster 1 main path. g: Cluster 2 top 50 papers (Edge degree = 7). h: Cluster 2 main paths. i: Cluster 3 top 52 papers (Edge degree = 17). j: Cluster 3 main path. k: Cluster 4 top 50 papers (Edge degree = 13). l: Cluster 4 main path. m: Cluster 5 top 50 papers (Edge degree = 11). n: Cluster 5 main path. o: Cluster 6 top 50 papers (Edge degree = 8). p: Cluster 6 main path. q: Cluster 7 top 44 papers (Edge degree = 4). r: Cluster 7 main path. s: Cluster 8 top 50 papers (Edge degree = 3). t: Cluster 8 main path. u: Cluster 9 top 50 papers (Edge degree = 4). v: Cluster 9 main path. w: Cluster 10 top 50 papers (Edge degree = 3). x: Cluster 10 main path. y: Cluster 11 top 48 papers (Edge degree = 29). z: Cluster 11 main path

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, R.R., Liu, C.Z. & Choo, KK.R. Synthesizing Knowledge through A Data Analytics-Based Systematic Literature Review Protocol. Inf Syst Front (2023). https://doi.org/10.1007/s10796-023-10432-3

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10796-023-10432-3

Keywords

Navigation