DYNAMIC INDEX PRUNING WITH REINFORCEMENT LEARNING FOR EFFICIENT LONG-CONTEXT GENERATION
Volume 2, Issue 3, Pp 26-34, 2025
DOI: https://doi.org/10.61784/adsj3030
Author(s)
JianYu Huang, PeiLin Xu*, Andrew Collins
Affiliation(s)
School of Computing and Augmented Intelligence, Arizona State University, USA.
Corresponding Author
PeiLin Xu
ABSTRACT
The exponential growth of context lengths in large language models (LLMs) has introduced significant computational challenges, particularly in memory consumption and inference latency. This paper proposes a novel dynamic index pruning framework leveraging reinforcement learning to optimize long-context generation efficiency. By selectively retaining informative tokens while discarding redundant information, our approach reduces computational overhead without compromising generation quality. We formulate the pruning decision as a sequential decision-making problem and employ a policy gradient method to learn optimal pruning strategies. The framework draws inspiration from attention-based neural architectures, where alignment mechanisms dynamically focus on relevant context portions. Experimental results demonstrate that our method achieves up to 40% reduction in memory footprint and 35% improvement in inference speed while maintaining comparable performance on benchmark tasks. The proposed framework addresses the critical bottleneck of attention mechanism scaling and provides a practical solution for deploying LLMs in resource-constrained environments.
KEYWORDS
Dynamic pruning; Reinforcement learning; Long-context generation; Large language models; Attention optimization; Computational efficiency
CITE THIS PAPER
JianYu Huang, PeiLin Xu, Andrew Collins. Dynamic index pruning with reinforcement learning for efficient long-context generation. AI and Data Science Journal. 2025, 2(3): 26-34. DOI: https://doi.org/10.61784/adsj3030.
REFERENCES
[1] Brown T, Mann B, Ryder N, et al. Language models are few-shot learners. Advances in neural information processing systems, 2020, 33: 1877-1901.
[2] Ramachandran P, Parmar N, Vaswani A, et al. Stand-alone self-attention in vision models. Advances in neural information processing systems, 2019, 32.
[3] Ghaith S. Deep context transformer: bridging efficiency and contextual understanding of transformer models. Applied Intelligence, 2014, 54(19): 8902-8923.
[4] Jiahao H, Bao Y. Rethinking transformers for efficiency and scalability. Available at SSRN 5161897, 2025.
[5] Yang J, Zeng Z, Shen Z. Neural-Symbolic Dual-Indexing Architectures for Scalable Retrieval-Augmented Generation. IEEE Access, 2025.
[6] Jaafra Y, Laurent J L, Deruyver A, et al. Reinforcement learning for neural architecture search: A review. Image and Vision Computing, 2019, 89: 57-66.
[7] Schwarzer M, Ceron J S O, Courville A, et al. Bigger, better, faster: Human-level atari with human-level efficiency. In International Conference on Machine Learning. PMLR, 2023: 30365-30380.
[8] Hachaj T, Piekarczyk M. On explainability of reinforcement learning-based machine learning agents trained with proximal policy optimization that utilizes visual sensor data. Applied Sciences, 2020, 15(2): 538.
[9] Dzendzik D, Foster J, Vogel C. English machine reading comprehension datasets: A survey. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021: 8784-8804.
[10] Mohamadkhani N, Hadian M. Cancer Screening Benefits Maximization Using Markov Decision Process Models: A Systematic Review. Jundishapur Journal of Chronic Disease Care, 2024, 13(3).
[11] Child R, Gray S, Radford A, et al. Generating long sequences with sparse transformers. arXiv preprint arXiv:1904.10509, 2019.
[12] Katharopoulos A, Vyas A, Pappas N, et al. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on machine learning. PMLR, 2020: 5156-5165.
[13] Beltagy I, Peters M E, Cohan A. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150, 2020.
[14] Omidi P, Huang X, Laborieux A, et al. Memory-augmented transformers: A systematic review from neuroscience principles to enhanced model architectures. arXiv preprint arXiv:2508.10824, 2025.
[15] Szelogowski D. Hebbian Memory-Augmented Recurrent Networks: Engram Neurons in Deep Learning. arXiv preprint arXiv:2507.21474, 2025.
[16] Munir M, Iqbal Z, Alqahtani NK. Biochar from different feedstocks as a sustainable approach to alleviate water deficit effects on zucchini. Pakistan Journal of Botany, 56(6).
[17] Wang Y, Ding G, Zeng Z, et al. Causal-Aware Multimodal Transformer for Supply Chain Demand Forecasting: Integrating Text, Time Series, and Satellite Imagery. IEEE Access, 2025.
[18] Zaheer M, Guruganesh G, Dubey K A, et al. Big bird: Transformers for longer sequences. Advances in neural information processing systems, 2020, 33: 17283-17297.
[19] Yang S, Ding G, Chen Z, Yang J. GART: Graph Neural Network-based Adaptive and Robust Task Scheduler for Heterogeneous Distributed Computing. IEEE Access, 2025.
[20] Cui Y, Han X, Chen J, et al. FraudGNN-RL: a graph neural network with reinforcement learning for adaptive financial fraud detection. IEEE Open Journal of the Computer Society, 2025.
[21] Chen J, Cui Y, Zhang X, et al. Temporal convolutional network for carbon tax projection: A data-driven approach. Applied Sciences, 2024, 14(20): 9213.
[22] Alshammari N, Alhdaires MOM, Qanash H, et al. Soil microbiota and identification of microorganisms using 16S rRNA gene sequencing by Illumina MiSeq in the Hail Region of Saudi Arabia. Pakistan Journal of Botany, 2024, 56(6).
[23] Zeng Z, Yang S, Ding G. Robust aggregation algorithms for federated learning in unreliable network environments. Journal of Computing and Electronic Information Management, 2025, 18(3): 34-42.
[24] Chen Z, Wang Y, Zhao X. Responsible Generative AI: Governance Challenges and Solutions in Enterprise Data Clouds. Journal of Computing and Electronic Information Management, 2025, 18(3): 59-65.
[25] Csaba B, Bibi A, Li Y, et al. Diversified Dynamic Routing for Vision Tasks. In European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 756-772.
[26] Mai N T, Cao W, Fang Q. A study on how LLMs (eg GPT-4, chatbots) are being integrated to support tutoring, essay feedback and content generation. Journal of Computing and Electronic Information Management, 2025, 18(3): 43-52.
[27] Ganiyeva R, Dadashova S, Hasanova D, et al. Stabilization of disturbances in membrane photochemical reactions in wheat seedlings under cold stress by natural exogenous saponins. Pakistan Journal of Botany, 2024, 56(6).
[28] Han X, Yang Y, Chen J, et al. Symmetry-Aware Credit Risk Modeling: A Deep Learning Framework Exploiting Financial Data Balance and Invariance. Symmetry, 2025, 17(3): 20738994.
[29] Lin H, Liu W. Symmetry-Aware Causal-Inference-Driven Web Performance Modeling: A Structure-Aware Framework for Predictive Analysis and Actionable Optimization. Symmetry, 2025, 17(12): 2058.
[30] Wang Y, Ding G, Zeng Z, et al. Causal-Aware Multimodal Transformer for Supply Chain Demand Forecasting: Integrating Text, Time Series, and Satellite Imagery. IEEE Access, 2025.
[31] Sun T, Yang J, Li J, et al. Enhancing auto insurance risk evaluation with transformer and SHAP. IEEE Access, 2024.
[32] Mai N T, Fang Q, Cao W. Measuring Student Trust and Over-Reliance on AI Tutors: Implications for STEM Learning Outcomes. International Journal of Social Sciences and English Literature, 2025, 9(12): 11-17.
[33] Nikolentzos G, Tixier A, Vazirgiannis M. Message passing attention networks for document understanding. In Proceedings of the aaai conference on artificial intelligence, 2020, 34(5): 8544-8551.

Download as PDF