Science, Technology, Engineering and Mathematics.
Open Access

A RETRIEVAL-AUGMENTED GENERATION FRAMEWORK FOR EXPLAINABLE ACADEMIC PAPER QUALITY ASSESSMENT

Download as PDF

Volume 7, Issue 4, Pp 56-61, 2025

DOI: https://doi.org/10.61784/ejst3102

Author(s)

WeiJing Zhu1, RunTao Ren2*, Wei Xie1, CenYing Yang2

Affiliation(s)

1Guangxi Science and Technology Information Network Center, Nanning 530022, Guangxi, China.

2City University of Hong Kong, Kowloon Tong, Hong Kong region, China.

Corresponding Author

RunTao Ren

ABSTRACT

With the exponential growth of global scholarly output, traditional academic paper evaluation methods face significant challenges in reliability, consistency, and scalability. Peer review processes suffer from low inter-rater agreement and lengthy decision times, while bibliometric approaches systematically disadvantage emerging fields. To address these systemic limitations, this study proposes a novel evaluation framework leveraging Retrieval-Augmented Generation (RAG) architecture and large language models (LLMs). The framework implements a four-dimensional assessment mechanism—analyzing research questions, methodologies, results, and conclusions—supported by contextual knowledge retrieval and explainable judgment generation. Experimental validation demonstrates the superiority of the RAG-based approach over both human experts and conventional machine learning baselines, achieving an F1-score of 0.77 at the quartile level. Additionally, the system provides transparent evaluative judgments supported by comparable evidence from prior literature. This work contributes to advancing scholarly communication by offering a scalable, explainable, and reliable alternative to existing evaluation paradigms.

KEYWORDS

Retrieval-Augmented Generation; Academic paper evaluation; Contextual knowledge retrieval; Explainable AI

CITE THIS PAPER

WeiJing Zhu, RunTao Ren, Wei Xie, CenYing Yang. A Retrieval-Augmented Generation framework for explainable academic paper quality assessment. Eurasia Journal of Science and Technology. 2025, 7(4): 56-61. DOI: https://doi.org/10.61784/ejst3102.

REFERENCES

[1] Fortunato S, Bergstrom C T, Borner K, et al. Science of science. Science, 2018, 359(6379), eaao0185. DOI:10.1126/science.aao0185.

[2] Bornmann L, Haunschild R, Mutz R. Growth rates of modern science: a latent piecewise growth curve approach to model publication numbers from established and new literature databases. Hum. Soc. Sci. Comm, 2021, 8(224). DOI: 10.1057/s41599-021-00903-w.

[3] Chaudhari N, Vora D, Kadam P, et al. Towards efficient knowledge extraction: natural language processing-based summarization of research paper introductions. Iaes International Journal of Artificial Intelligence (Ij-Ai), 2025, 14(1): 680. DOI: https://doi.org/10.11591/ijai.v14.i1.pp680-691.

[4] Proitz T. Peers in systematic review: gate keeping understandings of research in the field. Peer review in an Era of Evaluation, 2022, 275-296. DOI: https://doi.org/10.1007/978-3-030-75263-7_12.

[5] Hou J, Pan H, Guo T, et al. Prediction methods and applications in the science of science: a survey. Computer Science Review, 2019, 34, 100197. DOI: https://doi.org/10.1016/j.cosrev.2019.100197.

[6] Serpa S, Sá M, Santos A, et al. Challenges for the academic editor in the scientific publication. Academic Journal of Interdisciplinary Studies, 2020, 9(3): 12. DOI: https://doi.org/10.36941/ajis-2020-0037.

[7] Rowley J, Sbaffi L. Academics’ attitudes towards peer review in scholarly journals and the effect of role and discipline. Journal of Information Science, 2017, 44(5): 644-657. DOI: https://doi.org/10.1177/0165551517740821.

[8] Ross-Hellauer T. What is open peer review? A systematic review. F1000Research, 2017, 6, 588. DOI: 10.12688/f1000research.11369.2.

[9] Bravo G, Grimaldo F, López-Inesta E, et al. The effect of publishing peer review reports on referee behavior in five scholarly journals. Nature Communications, 2019, 10(1). DOI: https://doi.org/10.1038/s41467-018-08250-2.

[10] Haffar S, Bazerbachi F, Murad M H. Peer review bias: a critical review. Mayo Clinic Proceedings, 2019, 94(4): 670-676. DOI: https://doi.org/10.1016/j.mayocp.2018.09.004.

[11] Zhou J, Cai N, Tan Z Y, et al. Analysis of effects to journal impact factors based on citation networks generated via social computing. IEEE Access, 2019, 7, 19775-19781. DOI: 10.1109/ACCESS.2019.2895737.

[12] Amin M, Mabe M A. Impact factors: use and abuse. Medicina (Buenos Aires), 2003, 63(4): 347-354.

[13] Pollard A, Forss K. Evaluation quality assessment frameworks: a comparative assessment of their strengths and weaknesses. American Journal of Evaluation, 2022, 44(2): 190-210. DOI: https://doi.org/10.1177/10982140211062815.

[14] Deerwester, Dumais, Furnas, et al. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41, 391-407.

[15] Rosario B. Latent semantic indexing: An overview. Techn. rep. INFOSYS, 2000, 240, 1-16.

[16] Wang A, Singh A, Michael J, et al. GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461, 2018. DOI: https://doi.org/10.48550/arXiv.1804.07461.

[17] Jawahar G, Sagot B, Seddah D. What does BERT learn about the structure of language?. In ACL 2019-57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy. 2019, 3651-3657. DOI: 10.18653/v1/P19-1356.

[18] Beltagy I, Lo K, Cohan A. SciBERT: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong region, China. 2019, 3615-3620. DOI: 10.18653/v1/D19-1371.

[19] Bai J, Bai S, Chu Y, et al. Qwen technical report. arXiv preprint arXiv:2309.16609, 2023. DOI: https://doi.org/10.48550/arXiv.2505.09388.

[20] Khraisha Q, Put S, Kappenberg J, et al. Can large language models replace humans in systematic reviews? Evaluating GPT‐4's efficacy in screening and extracting data from peer‐reviewed and grey literature in multiple languages. Research Synthesis Methods, 2024, 15(4): 616-626.

[21] Richmond E H. Advanced Techniques in Natural Language Processing and Deep Learning for Unstructured Data Analysis with a Focus on Real-Time Sentiment Analysis and Trend Prediction in Social Media Platforms. QIT Press - International Journal of Multimedia Research (QITP-IJMMR), 2025, 5(1): 1-8.

[22] Lewis P, Perez E, Piktus A, et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. In Proceedings of the 34th International Conference on Neural Information Processing Systems (NIPS '20). Curran Associates Inc., Red Hook, NY, USA. 2020, 33, 9459-9474.

[23] Yang J, Shu L, Duan H, et al. RDguru: a conversational intelligent agent for rare diseases. IEEE Journal of Biomedical and Health Informatics, 2024. DOI: 10.1109/JBHI.2024.3464555.

[24] Li X, Wang S, Zeng S, et al. A survey on LLM-based multi-agent systems: workflow, infrastructure, and challenges. Vicinagearth, 2024, 1(1): 9.

[25] Landhuis E. Scientific Literature: Information Overload. Nature, 2016, 535(7612): 457-458. DOI: 10.1038/nj7612-457a.

[26] Kelly J, Sadeghieh T, Adeli K. Peer review in scientific publications: benefits, critiques, & a survival guide. Ejifcc, 2014, 25(3), 227-243.

[27] Cowell J M. Importance of peer review. The Journal of School Nursing, 2014, 30(6): 394-395.

[28] Drozdz J A, Ladomery M R. The peer review process: past, present, and future. British Journal of Biomedical Science, 2024, 81, 12054. DOI: 10.3389/bjbs.2024.12054.

[29] Godlee F, Gale C R, Martyn C N. Effect on the quality of peer review of blinding reviewers and asking them to sign their reports: a randomized controlled trial. Jama, 1998, 280(3): 237-240.

[30] Patel J. Why training and specialization is needed for peer review: a case study of peer review for randomized controlled trials. BMC medicine, 2014, 12(1): 128.

[31] Hirsch J E. An index to quantify an individual’s scientific research output. Proc. Natl. Acad. Sci. USA., 2005, 102(46): 16569-16572. DOI: 10.1073/pnas.0507655102.

[32] Elsevier. SciVal metric: Field-weighted citation impact (FWCI). 2022. https://service.elsevier.com/app/answers/detail/a_id/28192/supporthub/scival/p/10961/

[33] Patel D, Timsina P, Gorenstein L, et al. Traditional Machine Learning, Deep Learning, and BERT (Large Language Model) Approaches for Predicting Hospitalizations From Nurse Triage Notes: Comparative Evaluation of Resource Management. JMIR AI, 2024, 3(1): e52190.

[34] Luan Y, He L, Ostendorf M, et al. Multi-task identification of entities, relations, and coreference for scientific knowledge graph construction. arXiv preprint arXiv:1808.09602, 2018. DOI: https://doi.org/10.48550/arXiv.1808.09602.

All published work is licensed under a Creative Commons Attribution 4.0 International License. sitemap
Copyright © 2017 - 2025 Science, Technology, Engineering and Mathematics.   All Rights Reserved.