A Systematic Mapping Review: Tracking the Relationships Between Software Artifacts using NLP

Main Article Content

Fedaa Khalil
Ghaida Rebdawi
Nada Ghneim

Abstract

In software development, traceability from requirements to realization is essential, yet manual tracing is labour-intensive and prone to errors. Requirements Traceability (RT) is crucial for effective management and impact assessment. This paper explores the application of Natural Language Processing (NLP) in requirements traceability (RT) through a systematic mapping review of literature from 2019 to 2023. Out of 209 initial studies, we selected 49 using stringent criteria. RT approaches were categorised into ontology-based and embedding techniques. Embedding techniques have gained prominence for their ability to capture relationships between artifacts. We identified two primary paradigms in RT: rule-based methods, which use predefined heuristics, and machine learning approaches, including traditional classifiers. Machine learning models significantly improve accuracy and adaptability, especially when paired with advanced embeddings. A notable trend is the increasing reliance on standard datasets, like the CoEST repository, to validate methods, enhance reproducibility, and enable robust comparisons. Despite advancements, challenges persist. Non-functional requirements remain underexplored, and the lack of comprehensive benchmarks limits the generalizability of current approaches. Future research should focus on creating inclusive datasets with diverse requirements and integrating hybrid methods to improve performance. Overall, the study underscores the critical role of embedding techniques in RT while highlighting gaps and opportunities for advancing the field.

Article Details

How to Cite
[1]
F. Khalil, G. . Rebdawi, and N. . Ghneim, “A Systematic Mapping Review: Tracking the Relationships Between Software Artifacts using NLP”, ECTI-CIT Transactions, vol. 19, no. 2, pp. 321–333, Apr. 2025.
Section
Research Article

References

A. Tahir and R. Ahmad, “Requirement engineering practices-An empirical study,” Proceedings of the 2010 International Conference on Computational Intelligence and Software Engineering, pp. 1-5, 2010.

A. Aurum and C. Wohlin, “Requirements engineering: Setting the context,” Engineering and Managing Software Requirements, pp. 1-15, 2005.

X. Li, B. Wang, H. Wan, Y. Deng and Z. Wang, “Applications of Machine Learning in Requirements Traceability: A Systematic Mapping Study,” in Proceedings of the 35th International Conference on Software Engineering and Knowledge Engineering (SEKE), pp. 566-571, 2023.

J. Mucha, A. Kaufmann and D. Riehle , “A systematic literature review of pre-requirements specification traceability,” Requirements Engineering, vol. 29, pp. 119-141, 2024.

T. W. W. Aung, H. Huo and Y. Sui, “A literature review of automatic traceability links recovery for software change impact analysis,” in Proceedings of the 28th International Conference on Program Comprehension, pp. 14-24, 2020.

B. Wang, H. Wang, R. Luo, S. Zhang, Q. Zhu, “A Systematic Mapping Study of Information Retrieval Approaches Applied to Requirements Trace Recovery,” in Proceedings of the 34th International Conference on Software Engineering and Knowledge Engineering (SEKE), pp. 1-6, 2022.

Z. Pauzi and A. Capiluppi, “Applications of natural language processing in software traceability: A systematic mapping study,” Journal of Systems and Software, vol. 198, p. 111616, 2023.

Y. Lyu, H. Cho, P. Jung and S. Lee, “A Systematic Literature Review of Issue-Based Requirement Traceability,” in IEEE Access, vol. 11, pp. 13334-13348, 2023

T. Li, S. Wang, D. Lillis and Z. Yang, “Combining machine learning and logical reasoning to improve requirements traceability recovery,” Applied Sciences, vol. 10, no. 20, p. 7253, 2020.

L. Zhao, W. Alhoshan, A. Ferrari, K. J. Letsholo, M. A. Ajagbe, E. V. Chioasca and R. T. Batista-Navarro, “Natural language processing for requirements engineering: A systematic mapping study,” ACM Computing Surveys (CSUR), vol. 54, no. 3, pp. 1–41, 2021.

O. C. Z. Gotel and C. W. Finkelstein, “An analysis of the requirements traceability problem,” Proceedings of IEEE International Conference on Requirements Engineering, Colorado Springs, CO, USA, pp. 94-101, 1994.

S. Ibrahim, N. B. Idris, M. Munro, and A. Deraman, “A requirements traceability to support change impact analysis,” Asian Journal of Information Tech, vol. 4, no. 4, pp. 345-355, 2005.

R. Sonbol, G. Rebdawi and N. Ghneim, “The use of NLP-based text representation techniques to support requirement engineering tasks: A systematic mapping review,” in IEEE Access, vol. 10, pp. 62811-62830, 2022.

R. Tsuchiya, K. Nishikawa, H. Washizaki, Y. Fukazawa, Y. Shinohara, K. Oshima and R. Mibe, “Recovering transitive traceability links among various software artifacts for developers,” IEICE TRANSACTIONS on Information and Systems, vol. 102, no. 9, pp. 1750–1760, 2019.

V. Csuvik, A. Kicsi and L. Vid´acs, “Evaluation of textual similarity techniques in code-level traceability,” Computational Science and Its Applications—ICCSA 2019: 19th International Conference, pp. 529–543, 2019.

V. Csuvik, A. Kicsi and L. Vid´acs, “Source code level word embeddings in aiding semantic test-to-code traceability,” in 2019 IEEE/ACM 10th International Symposium on Software and Systems Traceability (SST), pp. 29-36, 2019.

C. Mills, J. Escobar-Avila, A. Bhattacharya, G. Kondyukov, S. Chakraborty and S. Haiduc, “Tracing with less data: Active learning for classification-based traceability link recovery,” Proceedings of the 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 103–113, 2019.

H. Kuang, H. Gao, H. Hu, X. Ma, J. L¨u, P. M¨ader and A. Egyed, “Using frugal user feedback with closeness analysis on code to improve IR-based traceability recovery,” Proceedings of the 2019 IEEE/ACM 27th International Conference on Program Comprehension (ICPC), pp. 369–379, 2019.

R. Xie, L. Chen, W. Ye, Z. Li, T. Hu, D. Du and S. Zhang, “DeepLink: A code knowledge graph based deep learning approach for issue-commit link recovery,” Proceedings of the 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 434–444, 2019.

D. Kchaou, N. Bouassida, M. Mefteh and H. Ben-Abdallah, “Recovering semantic traceability between requirements and design for change impact analysis,” Innovations in Systems and Software Engineering, vol. 15, pp. 101-115, 2019.

H. Abukwaik, A. Burger, B. K. Andam and T. Berger, “Semi-automated feature traceability with embedded annotations,” Proceedings of the 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 331 529–533, 2018.

S. Wang, T. Li and Z. Yang, “Exploring semantics of software artifacts to improve requirements traceability recovery: A hybrid approach,” Proceedings of the 2019 26th Asia-Pacific Software Engineering Conference (APSEC), pp. 39–46, 2019.

D. V. Rodriguez and D. L. Carver, “Comparison of information retrieval techniques for traceability link recovery,” in 2019 IEEE 2nd International Conference on Information and Computer Technologies (ICICT), pp. 186–193, 2019.

S. Tahvili, L. Hatvani, M. Felderer, W. Afzal and M. Bohlin, “Automated functional dependency detection between test cases using doc2vec and clustering,” in 2019 IEEE International Conference on Artificial Intelligence Testing (AITest), pp. 19–26, 2019.

L. Chen, D. Wang, J. Wang and Q. Wang, “Enhancing unsupervised requirements traceability with sequential semantics,” in 2019 26th Asia-Pacific Software Engineering Conference (APSEC), pp. 23–30, 2019.

W. Alhoshan, R. Batista-Navarro and L. Zhao, “Using frame embeddings to identify semantically related software requirements,” in 2nd Workshop on Natural Language Processing for Requirements Engineering, 2019.

A. H. Rasekh, S. M. Fakhrahmad and M. H. Sadreddini, “Mining traces between source code and textual documents,” International Journal of Computer Applications in Technology, vol. 59, no. 1, pp. 43-52, 2019.

N. Ali, H. Cai, A. Hamou-Lhadj and J. Hassine, “Exploiting parts-of-speech for effective automated requirements traceability,” Information and Software Technology, vol. 106, pp. 126–141, 2019.

M. Singh, “Using natural language processing and graph mining to explore inter-related requirements in software artefacts,” ACM SIGSOFT Software Engineering Notes, vol. 44, no. 1, pp. 37–42, 2022.

G. Deshpande, C. Arora and G. Ruhe, “Data driven elicitation and optimization of dependencies between requirements,” in 2019 IEEE 27th International Requirements Engineering Conference (RE), pp. 416–421, 2019.

R. Samer, M. Stettinger, M. Atas, A. Felfernig, G. Ruhe and G. Deshpande, “New approaches to the identification of dependencies between requirements,” in 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1265–1270, 2019.

H. Ruan, B. Chen, X. Peng and W. Zhao, “DeepLink: Recovering issue-commit links based on deep learning,” Journal of Systems and Software, vol. 158, p. 110406, 2019.

G. Deshpande, Q. Motger, C. Palomares, I. Kamra, K. Biesialska, X. Franch and J. Ho, “Requirements dependency extraction by integrating active learning with ontology-based retrieval,” in 2020 IEEE 28th International Requirements Engineering Conference (RE), pp. 78–89, 2020.

T. B. Du, G. H. Shen, Z. Q. Huang, Y. S. Yu and D. X. Wu, “Automatic traceability link recovery via active learning,” Frontiers of Information Technology & Electronic Engineering, vol. 21, no. 8, pp. 1217–1225, 2020.

J. Frattini, M. Junker, M. Unterkalmsteiner and D. Mendez, “Automatic extraction of cause effect relations from requirements artifacts,” in Proceedings of the 35th IEEE/ACM International Conference on Automated Software Engineering, pp. 561–572, 2020.

B. Wang, R. Peng, Z. Wang, X. Wang and Y. Li, “An automated hybrid approach for generating requirements trace links,” International Journal of Software Engineering and Knowledge Engineering, vol. 30, no. 07, pp. 1005–1048, 2020.

J. Fischbach, B. Hauptmann, L. Konwitschny, D. Spies and A. Vogelsang, “Towards causality extraction from requirements,” in 2020 IEEE 28th International Requirements Engineering Conference (RE), pp. 388–393, 2020.

K. Moran, D. N. Palacio, C. Bernal-C´ardenas, D. McCrystal, D. Poshyvanyk, C. Shenefiel and J. Johnson, “Improving the effectiveness of traceability link recovery using hierarchical Bayesian networks,” in Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, pp. 873–885, 2020.

R. Sonbol, G. Rebdawi and N. Ghneim, “Towards a semantic representation for functional software requirements,” in 2020 IEEE Seventh International Workshop on Artificial Intelligence for Requirements Engineering (AIRE), pp. 1–8, 2020.

L. R. J. Santos, G. Gadelha, F. Ramalho and T. Massoni, “Improving traceability recovery between bug reports and manual test cases,” in Proceedings of the XXXIV Brazilian Symposium on Software Engineering, pp. 293–302, 2020.

Y. Liu, J. Lin, and J. Cleland-Huang, “Traceability support for multilingual software projects,” in Proceedings of the 17th International Conference on Mining Software Repositories, pp. 443–454, 2020.

D. V. Rodriguez and D. L. Carver, “Multiobjective information retrieval-based NSGA-II optimization for requirements traceability recovery,” in 2020 IEEE International Conference on Electro Information Technology (EIT), pp. 271–280, 2020.

M. Rath, M. T. Tomova and P. M¨ader, “Spojitr: Intelligently link development artifacts,” in 2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER), pp. 652–656, 2020.

R. Asyrofi, T. Hidayat and S. Rochimah, “Comparative studies of several methods for building simple traceability and identifying the quality aspects of requirements in SRS documents,” in 2020 10th Electrical Power, Electronics, Communications, Controls and Informatics Seminar (EECCIS), pp. 243-247, 2020.

D. V. Rodriguez and D. L. Carver, “An IR-based artificial bee colony approach for traceability link recovery,” in 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), pp. 1145–1153, 2020.

T. Hey, F. Chen, S. Weigelt and W. F. Tichy, “Improving traceability link recovery using fine-grained requirements-to-code relations,” in 2021 IEEE International Conference on Software Maintenance and Evolution (ICSME), pp. 12–22, 2021.

M. Aldekhail and M. Almasri, “Intelligent identification and resolution of software requirement conflicts: Assessment and evaluation,” Computer Systems Science & Engineering, vol. 40, no. 2, 2022.

U. Shah, S. Patel and D. C. Jinwala, “Detecting intra-conflicts in non-functional requirements,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 29, no. 03, pp. 435–461, 2021.

A. Kicsi, V. Csuvik and L. Vid´acs, “Large scale evaluation of natural language processing-based test-to-code traceability approaches,” in IEEE Access, vol. 9, pp. 79089–79104, 2021.

S. Das, N. Deb, A. Cortesi and N. Chaki, “Sentence embedding models for similarity detection of software requirements,” SN Computer Science, vol. 2, pp. 1–11, 2021.

V. Leit˜ao and I. Medeiros, “SRXCRM: Discovering association rules between system requirements and product specifications,” in REFSQ Workshops, 2021.

J. Lin, Y. Liu, Q. Zeng, M. Jiang and J. Cleland Huang, “Traceability transformed: Generating more accurate links with pre-trained BERT models,” in 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp. 324–335, 2021.

A. Nicholson and G. J. LC, “Issue link label recovery and prediction for open source software,” in 2021 IEEE 29th International Requirements Engineering Conference Workshops (REW), pp. 126–135, 2021.

G. Deshpande, B. Sheikhi, S. Chakka, D. L. Zotegouon, M. N. Masahati and G. Ruhe, “Is BERT the new silver bullet? An empirical investigation of requirements dependency classification,” in 2021 IEEE 29th International Requirements Engineering Conference Workshops (REW), pp. 36–145, 2021.

J. Zhu, G. Xiao, Z. Zheng and Y. Sui, “Enhancing traceability link recovery with unlabeled data,” in 2022 IEEE 33rd International Symposium on Software Reliability Engineering (ISSRE), pp. 446–457, 2022.

N. H. Al-walidi, S. S. Azab, A. Khamis and N. R. Darwish, “Clustering-based automated requirement trace retrieval,” International Journal of Advanced Computer Science and Applications, vol. 13, no. 12, 2022.

R. Sonbol, G. Rebdawi and N. Ghneim, “Learning software requirements syntax: An unsupervised approach to recognize templates,” Knowledge-Based Systems, vol. 248, p. 108933, 2022.

P. Dai, L. Yang, Y. Wang, D. Jin and Y. Gong, “Constructing traceability links between software requirements and source code based on neural networks,” Mathematics, vol. 11, no. 2, p. 315, 2023.

J. Tian, L. Zhang and X. Lian, “A cross-level requirement trace link update model based on bidirectional encoder representations from transformers,” Mathematics, vol. 11, no. 3, p. 623, 2023.

M. Abbas, A. Ferrari, A. Shatnawi, E. Enoiu, M. Saadatmand and D. Sundmark, “On the relationship between similar requirements and similar software: A case study in the railway domain,” Requirements Engineering, vol. 28, no. 1, pp. 23–47, 2023.

R. F. Al-Msie’deen, “Requirements traceability: Recovering and visualizing traceability links between requirements and source code of object-oriented software systems,” arXiv preprint arXiv:2307.05188, 2023.

L. C. Briand, Y. Labiche and L. O’Sullivan, “Impact analysis and change management of UML models,” in International Conference on Software Maintenance, 2003. ICSM 2003, pp. 256–265, 2003.

G. Lucassen, F. Dalpiaz, J. M. E. van der Werf and S. Brinkkemper, “Improving agile requirements: The quality user story framework and tool,” Requirements Engineering, vol. 21, pp. 383–403, 2016.