Unifying Data Refinement and Fusion Strategies: A Cutting-Edge Methodology for E-Learning Performance Optimization

N S Koti Mani Kumar Tirumanadham; Pravin R. Kshirsagar; Subba Rao Polamuri; Anurag Sharma; I Lakshmi Manikyamba

doi:10.16920/jeet/2026/v39i4/26112

Unifying Data Refinement and Fusion Strategies: A Cutting-Edge Methodology for E-Learning Performance Optimization

Authors

N S Koti Mani Kumar Tirumanadham School of Computer Science & Engineering, VIT-AP University, Amaravathi, 522237, Andhra Pradesh, India
Pravin R. Kshirsagar Professor, Dean (R&D), J D College of Engineering and Management, Nagpur, India;
Subba Rao Polamuri Associate Professor, Department of Computer Science and Engineering, Aditya University, Surampalem, India.
Anurag Sharma Associate Professor, School of Electrical and Electronic Engineering, Newcastle University in Singapore (NUiS), Singapore
I Lakshmi Manikyamba 5Associate Professor, Department of Computer Science and Engineering, Jawaharlal Nehru Technological University Hyderabad, UCESTH, Kukatpally, Hyderabad, India

DOI:

https://doi.org/10.16920/jeet/2026/v39i4/26112

Keywords:

Predictive modelling, Feature selection, E-learning systems, Voting ensemble model, SMOTE algorithm, TRIFEX approach

Abstract

The surge in the e-learning market has compelled the need for intelligent predictive models to minimize the gap between performance analysis and personalized learning systems to improve student performance. With the wide growth of the e-learning market, there is a great demand for intelligent predictive models to bridge the gap between performance analysis and personalized learning systems to boost student performance. Yet, traditional prediction methods are not effective due to issues like class imbalance, outliers, irrelevant features, and limited model interpretability. This research introduces an advanced framework to predict student academic performance based on an educational data mining dataset, Students' Academic Performance Dataset (xAPI-Edu-Data), comprising 480 instances, 16 attributes, and multi-variate integer and categorical features of the e-learning environment and educational data mining. The proposed framework involves class imbalance handling using Synthetic Minority Oversampling Technique (SMOTE), outlier removal using Interquartile Range (IQR), feature scaling and data standardization using Z-score normalization. A novel hybrid feature selection method TRIFEX is proposed to select the most influencing features to the student performance by combining ANOVA F-statistics, Recursive Feature Elimination (RFE) and Lasso regularization. The Logistic Regression, Decision Tree, and K-Nearest Neighbor (KNN) classifiers are used in the study. The hyperparameter optimization is done using Randomized Search CV, Grid Search CV, and Optuna to increase the efficiency and generalization power of the model. In addition, a voting, based ensemble model is created to fuse the virtues of individual classifiers for good prediction. Experimental results have shown that the proposed ensemble model is more accurate with 98.99% accuracy, 99.00% F1-score and 0.1005 RMSE as compared to the conventional predictive models. The results suggest that the suggested method has substantial potential to increase the accuracy of prediction, explainability of features and individualized learning support in contemporary e-learning environments.

Downloads

Download data is not yet available.

Downloads

Published

2026-04-30

How to Cite

Tirumanadham, N. S. K. M. K., Kshirsagar, P. R., Polamuri, S. R., Sharma, A., & Manikyamba, I. L. (2026). Unifying Data Refinement and Fusion Strategies: A Cutting-Edge Methodology for E-Learning Performance Optimization. Journal of Engineering Education Transformations, 39(4), 114–132. https://doi.org/10.16920/jeet/2026/v39i4/26112

Download Citation

Issue

Volume 39, Issue 4, April 2026

Section

Articles

References

Amrieh, E. A., Hamtini, T., & Aljarah, I. (2016). Mining Educational Data to Predict Student’s academic Performance using Ensemble Methods. International Journal of Database Theory and Application, 9(8), 119–136. https://doi.org/10.14257/ijdta.2016.9.8.13

Asif, R., Merceron, A., Ali, S. A., & Haider, N. G. (2017). Analyzing undergraduate students’ performance using educational data mining. Computers & Education, 113, 177–194. https://doi.org/10.1016/j.compedu.2017.05.007

Bandela, H. B., Sikindar, S., Swaroop, C. R., Rao, M. V. a. L. N., Surapaneni, J., & Tirumanadham, N. S. K. M. K. (2023). An Optimized Bagging Ensemble Learning of Machine Learning Algorithms for Early Detection of Diabetes. 2023 International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS), 274–281. https://doi.org/10.1109/icssas57918.2023.10331844

Beaulac, C., & Rosenthal, J. S. (2019). Predicting university students’ academic success and major using random forests. Research in Higher Education, 60(7), 1048–1064. https://doi.org/10.1007/s11162- 019-09546-y

Bernardet, U., & Verschure, P. F. M. J. (2010). iqr: A Tool for the Construction of Multi-level Simulations of Brain and Behaviour. Neuroinformatics, 8(2), 113– 134. https://doi.org/10.1007/s12021-010-9069-7

Bhaskaran, S., & Marappan, R. (2021). Design and analysis of an efficient machine learning based hybrid recommendation system with enhanced density-based spatial clustering for digital e-learning applications. Complex & Intelligent Systems, 9(4), 3517–3533. https://doi.org/10.1007/s40747-021- 00509-4

Chen, Q., Meng, Z., Liu, X., Jin, Q., & Su, R. (2018). Decision variants for the automatic determination of optimal feature subset in RF-RFE. Genes, 9(6), 301. https://doi.org/10.3390/genes9060301

Cheadle, C., Vawter, M. P., Freed, W. J., & Becker, K. G. (2003). Analysis of microarray data using Z Score Transformation. Journal of Molecular Diagnostics, 5(2), 73–81. https://doi.org/10.1016/S1525- 1578(10)60455-2

Duan, J., Soussen, C., Brie, D., Idier, J., Wan, M., & Wang, Y. (2016). Generalized LASSO with under-determined regularization matrices. Signal Processing, 127, 239–246. https://doi.org/10.1016/j.sigpro.2016.03.001

Elreedy, D., & Atiya, A. F. (2019). A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32–64. https://doi.org/10.1016/j.ins.2019.07.070

Enughwure, A. A., Mercy, E., & Ogheneruno, A. (2020). Prediction of student performance in engineering drawing using machine learning methods and Synthetic Minority Oversampling Technique (SMOTE). American Academic & Scholarly Research Journal, 12(4).

Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2003). KNN Model-Based Approach in Classification. In Lecture Notes in Computer Science (pp. 986–996). https://doi.org/10.1007/978-3-540-39964-3_62

Gupta, S. C., & Goel, N. (2023). Predictive Modeling and Analytics for Diabetes using Hyperparameter tuned Machine Learning Techniques. Procedia Computer Science, 218, 1257–1269. https://doi.org/10.1016/j.procs.2023.01.104

Hall, L., Chawla, N., & Bowyer, K. (2002). Decision tree learning on very large data sets. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218), 3, 2579–2584. https://doi.org/10.1109/ICSMC.1998.725047

Hanifi, S., Cammarono, A., & Zare-Behtash, H. (2023). Advanced hyperparameter optimization of deep learning models for wind power prediction. Renewable Energy, 221, 119700. https://doi.org/10.1016/j.renene.2023.119700

Hutter, F., Hamadi, Y., Hoos, H. H., & Leyton-Brown, K. (2006). Performance prediction and automated tuning of randomized and parametric algorithms. In Lecture Notes in Computer Science (pp. 213–228). https://doi.org/10.1007/11889205_17

Kaviyarasi, R., & Balasubramanian, T. (2018). Exploring the High Potential Factors that Affects Students’ Academic Performance. International Journal of Education and Management Engineering, 8(6), 15– 23. https://doi.org/10.5815/ijeme.2018.06.02

Khanal, S. S., Prasad, P., Alsadoon, A., & Maag, A. (2019). A systematic review: machine learning based recommendation systems for e-learning. Education and Information Technologies, 25(4), 2635–2664. https://doi.org/10.1007/s10639-019-10063-9

Kim, T. K. (2017). Understanding one-way ANOVA using conceptual figures. Korean Journal of Anesthesiology, 70(1), 22. https://doi.org/10.4097/kjae.2017.70.1.22

Kotsiantis, S. B. (2011). Use of machine learning techniques for educational proposes: a decision support system for forecasting students’ grades. Artificial Intelligence Review, 37(4), 331–344. https://doi.org/10.1007/s10462-011-9234-x

Myles, A. J., Feudale, R. N., Liu, Y., Woody, N. A., & Brown, S. D. (2004). An introduction to decision tree modeling. Journal of Chemometrics, 18(6), 275–285. https://doi.org/10.1002/cem.873

Peng, G., Sun, S., Xu, Z., Du, J., Qin, Y., Sharshir, S. W., Kandeal, A., Kabeel, A., & Yang, N. (2024). The effect of dataset size and the process of big data mining for investigating solar-thermal desalination by using machine learning. International Journal of Heat and Mass Transfer, 236, 126365. https://doi.org/10.1016/j.ijheatmasstransfer.2024.12 6365

Popescu, E., & Leon, F. (2018). Predicting academic performance based on learner traces in a social learning environment. IEEE Access, 6, 72774– 72785. https://doi.org/10.1109/ACCESS.2018.2882297

Prenkaj, B., Velardi, P., Stilo, G., Distante, D., & Faralli, S. (2020). A survey of machine learning approaches for student dropout prediction in online courses. ACM Computing Surveys, 53(3), 1–34. https://doi.org/10.1145/3388792

R, H. K., Vallabhaneni, P., Chaitanya, R. S. K., Kaveti, K. K., Rao, M. V. a. L. N., & Tirumanadham, N. S. K. M. K. (2023). Data-Driven Early Warning System for Subject Performance: A SMOTE and Ensemble Approach (SMOTE-RFET). 2023 International Conference on Sustainable Communication Networks and Application (ICSCNA), 998–1004. https://doi.org/10.1109/ICSCNA58489.2023.10370 047

Ranjan, G. S. K., Verma, A. K., & Radhika, S. (2019). K-Nearest Neighbors and Grid Search CV based real time fault monitoring system for industries. 2022 IEEE 7th International Conference for Convergence in Technology (I2CT), 1–5. https://doi.org/10.1109/I2CT45611.2019.9033691

Sanz, H., Valim, C., Vegas, E., Oller, J. M., & Reverter, F. (2018). SVM-RFE: selection and visualization of the most relevant features through non-linear kernels. BMC Bioinformatics, 19(1). https://doi.org/10.1186/s12859-018-2451-4

Shaw, R. G., & Mitchell-Olds, T. (1993). ANOVA for Unbalanced Data: An Overview. Ecology, 74(6), 1638–1645. https://doi.org/10.2307/1939922

Shieh, M., & Yang, C. (2007). Multiclass SVM-RFE for product form feature selection. Expert Systems With Applications, 35(1–2), 531–541. https://doi.org/10.1016/j.eswa.2007.07.043

Srinivas, P., & Katarya, R. (2021). hyOPTXg: OPTUNA hyper-parameter optimization framework for predicting cardiovascular disease using XGBoost. Biomedical Signal Processing and Control, 73, 103456. https://doi.org/10.1016/j.bspc.2021.103456

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Statistical Methodology), 58(1), 267–288. https://doi.org/10.1111/j.2517- 6161.1996.tb02080.x

Vishnu, M. K., Rupak, V. R. V., Vedhapriyaa, S., Sangeetha, M., Manjuladevi, R., & Sagana, C. (2023). Recurrent gastric cancer Prediction using Randomized Search CV Optimizer. 2022 International Conference on Computer Communication and Informatics (ICCCI). https://doi.org/10.1109/ICCCI56745.2023.1012840 9

Wan, X., Wang, W., Liu, J., & Tong, T. (2014). Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Medical Research Methodology, 14(1). https://doi.org/10.1186/1471-2288-14-135

Zhang, S., Li, X., Zong, M., Zhu, X., & Wang, R. (2017). Efficient KNN classification with different numbers of nearest neighbors. IEEE Transactions on Neural Networks and Learning Systems, 29(5), 1774–1785. https://doi.org/10.1109/TNNLS.2017.2673241

Zhang, Z., Cheng, Y., & Liu, N. C. (2014). Comparison of the effect of mean-based method and z-score for field normalization of citations at the level of Web of Science subject categories. Scientometrics, 101(3), 1679–1693. https://doi.org/10.1007/s11192-014-1294-7

Unifying Data Refinement and Fusion Strategies: A Cutting-Edge Methodology for E-Learning Performance Optimization

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

References

Make New Submission

Quick links

Template

sjr

Address

Unifying Data Refinement and Fusion Strategies: A Cutting-Edge Methodology for E-Learning Performance Optimization

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

How to Cite

Issue

Section

References

Access to login into the old portal (Manuscript Communicator) for Peer Review

Make New Submission

Quick links

Template

sjr