A Dynamic Memory Routing Framework for Multimodal Conversational Question Answering in Large Language Models
Main Article Content
Abstract
Multimodal Question Answering (MQA) in large language models (LLMs) requires adaptive modeling of modality relevance across conversational turns. However, existing approaches rely on static fusion strategies that treat modalities uniformly and fail to capture dynamic modality importance. To address this limitation, we propose DynaRoute, a dynamic memory routing framework for LLM-based MQA. DynaRoute integrates a Bi-LSTM-based conversational memory to model evolving dialogue context and a query-conditioned routing mechanism that dynamically assigns modality relevance at each interaction step. The resulting representations are processed by an LLM-based decoder to generate context-aware responses. Experiments on four benchmarks-VQA-v2, GQA, VisDial, and A-OKVQA-demonstrate consistent improvements over unimodal, static fusion, and mixture-of-experts baselines. DynaRoute achieves an improvement of up to 8.7% under noisy conditions and 6.5% under clean settings, while also obtaining the highest multi-turn consistency (68.9) and robustness (81.3) scores. These results highlight the effectiveness of memory-aware dynamic routing and establish DynaRoute as a principled framework for conversational multimodal question answering.
Article Details

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
References
P. Lusztin and M. Labonne, LLM Engineer’s Handbook, Packt Publishing, Birmingham, UK, 2024, ch. 1.
H. Naveed, A. U. Khan, S. Qiu, M. Saqib, S. Anwar, M. Usman, N. Barnes and A. S. Mian, “A comprehensive overview of large language models,” arXiv preprint arXiv:2307.06435, 2023.
Y. Saleh, M. Abu Talib, Q. Nasir and F. Dakalbab, “Evaluating large language models: A systematic review of efficiency, applications, and future directions,” Frontiers in Computer Science, vol. 7, Art. no. 1523699, 2025.
V. Alto, Building LLM Powered Applications, Packt Publishing, Birmingham, UK, 2024.
T. Munyer, A. A. Tanvir, A. Das and X. Zhong, “DeepTextMark: A deep learning-driven text watermarking approach for identifying large language model generated text,” IEEE Access, vol. 12, pp. 40508–40520, 2024.
Y. Zhou, J. Wen, J. Jia, L. Gao and Z. Zhang, “C-Net: A compression-based lightweight network for machine-generated text detection,” IEEE Signal Processing Letters, vol. 31, pp. 1269–1273, 2024.
S. Y.-T. Lee, A. Bahukhandi, D. Liu and K.-L. Ma, “Towards dataset-scale and feature-oriented evaluation of text summarization in large language model prompts,” IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 1, pp. 481–491, Jan. 2025.
H. Babaei Giglou, G. Rehm, S. Dietze, S. Schimmler and F. Kr¨uger, “Scholarly question answering using large language models in the NFDI4DataScience gateway,” in Natural Scientific Language Processing and Research Knowledge Graphs, Lecture Notes in Computer Science, vol. 14770, Springer, Cham, Switzerland, pp. 1–18, 2024.
M. A. Borroto Santana, K. Gallagher, A. Ielo, I. Kareem, F. Ricca and A. Russo, “Question answering with LLMs and learning from answer sets,” Theory and Practice of Logic Programming, pp. 1–25, 2025.
Y. Sun, K. Zhang and Y. Su, “Multimodal question answering for unified information extraction,” arXiv preprint arXiv:2310.03017, 2023.
T. Qian, R. Cui, J. Chen, P. Peng, X. Guo and Y.-G. Jiang, “Locate before answering: Answer guided question localization for video question answering,” IEEE Transactions on Multimedia, vol. 26, pp. 4554–4563, 2024.
M. S. M. Bhuyan, E. Hossain, K. A. Sathi, M. A. Hossain and M. A. A. Dewan, “BVQA: Connecting language and vision through multimodal attention for open-ended question answering,” IEEE Access, vol. 13, pp. 27570–27586, 2025.
S. Tan, M. Ge, D. Guo, H. Liu and F. Sun, 301 “Knowledge-based embodied question answering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 10, pp. 11948–11960, Oct. 2023.
Z. Xiong, L. Zeng, Y. Wu, J. Li, X. Yuan and B. Mo, “Application of deep neural networks integrating multimodal information in intelligent question answering systems,” in Proceedings of the 3rd International Conference on Artificial Intelligence and Autonomous Robot Systems (AIARS), Bristol, United Kingdom, pp. 693–698, 2024.
M. Bi, Q. Zhang, M. Zuo, Z. Xu and Q. Jin, “Bi-directional long short-term memory model with semantic positional attention for the question answering system,” ACM Transactions on Asian and Low-Resource Language Information Processing, vol. 20, no. 5, Art. no. 77, Sep. 2021.
A. M. Elema, “Developing Amharic question answering model over unstructured data source using deep learning approach,” in Proceedings of the International Conference on Information and Communication Technology for Development for Africa (ICT4DA), Bahir Dar, Ethiopia, pp. 108–113, 2022.
Y. Yuan, A. Gupta, J. Li, S. Dash, F. Wang and M. Zhang, “X-MoE: Enabling scalable training for emerging mixture-of-experts architectures on HPC platforms,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC ’25), Association for Computing Machinery, New York, USA, pp. 1315–1331, 2025.
M. Malinowski, M. Rohrbach and M. Fritz, “Ask Your Neurons: A Deep Learning Approach to Visual Question Answering,” International Journal of Computer Vision, vol. 125, no. 1–3, pp. 110–135, Dec. 2017.
H. Demirhan and W. Zadrozny, “Survey of Multimodal Medical Question Answering,” BioMedInformatics, vol. 4, no. 1, pp. 50–74, 2024.
Z. Zhao, Z. Zhang, X. Jiang and D. Cai, “MultiTurn Video Question Answering via Hierarchical Attention Context Reinforced Networks,” IEEE Transactions on Image Processing, vol. 28, no. 8, pp. 3860–3872, Aug. 2019.
H. Zhang, “Research on Automatic Question Answering System for Online Educational Robots Based on Deep Learning,” Proceedings of the 2024 International Symposium on Intelligent Robotics and Systems (ISoIRS), Changsha, China, pp. 279–283, 2024.
N. Prabakaran, R. Kannadasan, A. Krishnamoorthy and V. Kakani, “A Bidirectional LSTM Approach for Written Script Auto-Evaluation Using Keyword-Based Pattern Matching,” Natural Language Processing Journal, vol. 5, Art. no. 100033, 2023, ISSN: 29497191.
W. Wang, F. Lee, S. Yang and Q. Chen, “An improved capsule network based on capsule filter routing,” IEEE Access, vol. 9, pp. 109374–109383, 2021.
S. Sirika and S. Mahajan, “Survey on dynamic routing protocols,” International Journal of Engineering Research and Technology, vol. 5, 2016, Art. no. IJERTV5IS010028.
B. S. Kim, J. Kim, D. Lee and B. Jang, “Visual question answering: A survey of methods, datasets, evaluation, and challenges,” ACM Computing Surveys, vol. 57, no. 10, Art. no. 249, Oct. 2025.