Unifying Data Refinement and Fusion Strategies: A Cutting-Edge Methodology for E-Learning Performance Optimization
DOI:
https://doi.org/10.16920/jeet/2026/v39i4/26112Keywords:
Predictive modelling, Feature selection, E-learning systems, Voting ensemble model, SMOTE algorithm, TRIFEX approachAbstract
The surge in the e-learning market has compelled the need for intelligent predictive models to minimize the gap between performance analysis and personalized learning systems to improve student performance. With the wide growth of the e-learning market, there is a great demand for intelligent predictive models to bridge the gap between performance analysis and personalized learning systems to boost student performance. Yet, traditional prediction methods are not effective due to issues like class imbalance, outliers, irrelevant features, and limited model interpretability. This research introduces an advanced framework to predict student academic performance based on an educational data mining dataset, Students' Academic Performance Dataset (xAPI-Edu-Data), comprising 480 instances, 16 attributes, and multi-variate integer and categorical features of the e-learning environment and educational data mining. The proposed framework involves class imbalance handling using Synthetic Minority Oversampling Technique (SMOTE), outlier removal using Interquartile Range (IQR), feature scaling and data standardization using Z-score normalization. A novel hybrid feature selection method TRIFEX is proposed to select the most influencing features to the student performance by combining ANOVA F-statistics, Recursive Feature Elimination (RFE) and Lasso regularization. The Logistic Regression, Decision Tree, and K-Nearest Neighbor (KNN) classifiers are used in the study. The hyperparameter optimization is done using Randomized Search CV, Grid Search CV, and Optuna to increase the efficiency and generalization power of the model. In addition, a voting, based ensemble model is created to fuse the virtues of individual classifiers for good prediction. Experimental results have shown that the proposed ensemble model is more accurate with 98.99% accuracy, 99.00% F1-score and 0.1005 RMSE as compared to the conventional predictive models. The results suggest that the suggested method has substantial potential to increase the accuracy of prediction, explainability of features and individualized learning support in contemporary e-learning environments.
Downloads
Downloads
Published
How to Cite
Issue
Section
References
Amrieh, E. A., Hamtini, T., & Aljarah, I. (2016). Mining Educational Data to Predict Student’s academic Performance using Ensemble Methods. International Journal of Database Theory and Application, 9(8), 119–136. https://doi.org/10.14257/ijdta.2016.9.8.13
Asif, R., Merceron, A., Ali, S. A., & Haider, N. G. (2017). Analyzing undergraduate students’ performance using educational data mining. Computers & Education, 113, 177–194. https://doi.org/10.1016/j.compedu.2017.05.007
Bandela, H. B., Sikindar, S., Swaroop, C. R., Rao, M. V. a. L. N., Surapaneni, J., & Tirumanadham, N. S. K. M. K. (2023). An Optimized Bagging Ensemble Learning of Machine Learning Algorithms for Early Detection of Diabetes. 2023 International Conference on Self Sustainable Artificial Intelligence Systems (ICSSAS), 274–281. https://doi.org/10.1109/icssas57918.2023.10331844
Beaulac, C., & Rosenthal, J. S. (2019). Predicting university students’ academic success and major using random forests. Research in Higher Education, 60(7), 1048–1064. https://doi.org/10.1007/s11162- 019-09546-y
Bernardet, U., & Verschure, P. F. M. J. (2010). iqr: A Tool for the Construction of Multi-level Simulations of Brain and Behaviour. Neuroinformatics, 8(2), 113– 134. https://doi.org/10.1007/s12021-010-9069-7
Bhaskaran, S., & Marappan, R. (2021). Design and analysis of an efficient machine learning based hybrid recommendation system with enhanced density-based spatial clustering for digital e-learning applications. Complex & Intelligent Systems, 9(4), 3517–3533. https://doi.org/10.1007/s40747-021- 00509-4
Chen, Q., Meng, Z., Liu, X., Jin, Q., & Su, R. (2018). Decision variants for the automatic determination of optimal feature subset in RF-RFE. Genes, 9(6), 301. https://doi.org/10.3390/genes9060301
Cheadle, C., Vawter, M. P., Freed, W. J., & Becker, K. G. (2003). Analysis of microarray data using Z Score Transformation. Journal of Molecular Diagnostics, 5(2), 73–81. https://doi.org/10.1016/S1525- 1578(10)60455-2
Duan, J., Soussen, C., Brie, D., Idier, J., Wan, M., & Wang, Y. (2016). Generalized LASSO with under-determined regularization matrices. Signal Processing, 127, 239–246. https://doi.org/10.1016/j.sigpro.2016.03.001
Elreedy, D., & Atiya, A. F. (2019). A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance. Information Sciences, 505, 32–64. https://doi.org/10.1016/j.ins.2019.07.070
Enughwure, A. A., Mercy, E., & Ogheneruno, A. (2020). Prediction of student performance in engineering drawing using machine learning methods and Synthetic Minority Oversampling Technique (SMOTE). American Academic & Scholarly Research Journal, 12(4).
Guo, G., Wang, H., Bell, D., Bi, Y., & Greer, K. (2003). KNN Model-Based Approach in Classification. In Lecture Notes in Computer Science (pp. 986–996). https://doi.org/10.1007/978-3-540-39964-3_62
Gupta, S. C., & Goel, N. (2023). Predictive Modeling and Analytics for Diabetes using Hyperparameter tuned Machine Learning Techniques. Procedia Computer Science, 218, 1257–1269. https://doi.org/10.1016/j.procs.2023.01.104
Hall, L., Chawla, N., & Bowyer, K. (2002). Decision tree learning on very large data sets. 1998 IEEE International Conference on Systems, Man, and Cybernetics (Cat. No.98CH36218), 3, 2579–2584. https://doi.org/10.1109/ICSMC.1998.725047
Hanifi, S., Cammarono, A., & Zare-Behtash, H. (2023). Advanced hyperparameter optimization of deep learning models for wind power prediction. Renewable Energy, 221, 119700. https://doi.org/10.1016/j.renene.2023.119700
Hutter, F., Hamadi, Y., Hoos, H. H., & Leyton-Brown, K. (2006). Performance prediction and automated tuning of randomized and parametric algorithms. In Lecture Notes in Computer Science (pp. 213–228). https://doi.org/10.1007/11889205_17
Kaviyarasi, R., & Balasubramanian, T. (2018). Exploring the High Potential Factors that Affects Students’ Academic Performance. International Journal of Education and Management Engineering, 8(6), 15– 23. https://doi.org/10.5815/ijeme.2018.06.02
Khanal, S. S., Prasad, P., Alsadoon, A., & Maag, A. (2019). A systematic review: machine learning based recommendation systems for e-learning. Education and Information Technologies, 25(4), 2635–2664. https://doi.org/10.1007/s10639-019-10063-9
Kim, T. K. (2017). Understanding one-way ANOVA using conceptual figures. Korean Journal of Anesthesiology, 70(1), 22. https://doi.org/10.4097/kjae.2017.70.1.22
Kotsiantis, S. B. (2011). Use of machine learning techniques for educational proposes: a decision support system for forecasting students’ grades. Artificial Intelligence Review, 37(4), 331–344. https://doi.org/10.1007/s10462-011-9234-x
Myles, A. J., Feudale, R. N., Liu, Y., Woody, N. A., & Brown, S. D. (2004). An introduction to decision tree modeling. Journal of Chemometrics, 18(6), 275–285. https://doi.org/10.1002/cem.873
Peng, G., Sun, S., Xu, Z., Du, J., Qin, Y., Sharshir, S. W., Kandeal, A., Kabeel, A., & Yang, N. (2024). The effect of dataset size and the process of big data mining for investigating solar-thermal desalination by using machine learning. International Journal of Heat and Mass Transfer, 236, 126365. https://doi.org/10.1016/j.ijheatmasstransfer.2024.12 6365
Popescu, E., & Leon, F. (2018). Predicting academic performance based on learner traces in a social learning environment. IEEE Access, 6, 72774– 72785. https://doi.org/10.1109/ACCESS.2018.2882297
Prenkaj, B., Velardi, P., Stilo, G., Distante, D., & Faralli, S. (2020). A survey of machine learning approaches for student dropout prediction in online courses. ACM Computing Surveys, 53(3), 1–34. https://doi.org/10.1145/3388792
R, H. K., Vallabhaneni, P., Chaitanya, R. S. K., Kaveti, K. K., Rao, M. V. a. L. N., & Tirumanadham, N. S. K. M. K. (2023). Data-Driven Early Warning System for Subject Performance: A SMOTE and Ensemble Approach (SMOTE-RFET). 2023 International Conference on Sustainable Communication Networks and Application (ICSCNA), 998–1004. https://doi.org/10.1109/ICSCNA58489.2023.10370 047
Ranjan, G. S. K., Verma, A. K., & Radhika, S. (2019). K-Nearest Neighbors and Grid Search CV based real time fault monitoring system for industries. 2022 IEEE 7th International Conference for Convergence in Technology (I2CT), 1–5. https://doi.org/10.1109/I2CT45611.2019.9033691
Sanz, H., Valim, C., Vegas, E., Oller, J. M., & Reverter, F. (2018). SVM-RFE: selection and visualization of the most relevant features through non-linear kernels. BMC Bioinformatics, 19(1). https://doi.org/10.1186/s12859-018-2451-4
Shaw, R. G., & Mitchell-Olds, T. (1993). ANOVA for Unbalanced Data: An Overview. Ecology, 74(6), 1638–1645. https://doi.org/10.2307/1939922
Shieh, M., & Yang, C. (2007). Multiclass SVM-RFE for product form feature selection. Expert Systems With Applications, 35(1–2), 531–541. https://doi.org/10.1016/j.eswa.2007.07.043
Srinivas, P., & Katarya, R. (2021). hyOPTXg: OPTUNA hyper-parameter optimization framework for predicting cardiovascular disease using XGBoost. Biomedical Signal Processing and Control, 73, 103456. https://doi.org/10.1016/j.bspc.2021.103456
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B (Statistical Methodology), 58(1), 267–288. https://doi.org/10.1111/j.2517- 6161.1996.tb02080.x
Vishnu, M. K., Rupak, V. R. V., Vedhapriyaa, S., Sangeetha, M., Manjuladevi, R., & Sagana, C. (2023). Recurrent gastric cancer Prediction using Randomized Search CV Optimizer. 2022 International Conference on Computer Communication and Informatics (ICCCI). https://doi.org/10.1109/ICCCI56745.2023.1012840 9
Wan, X., Wang, W., Liu, J., & Tong, T. (2014). Estimating the sample mean and standard deviation from the sample size, median, range and/or interquartile range. BMC Medical Research Methodology, 14(1). https://doi.org/10.1186/1471-2288-14-135
Zhang, S., Li, X., Zong, M., Zhu, X., & Wang, R. (2017). Efficient KNN classification with different numbers of nearest neighbors. IEEE Transactions on Neural Networks and Learning Systems, 29(5), 1774–1785. https://doi.org/10.1109/TNNLS.2017.2673241
Zhang, Z., Cheng, Y., & Liu, N. C. (2014). Comparison of the effect of mean-based method and z-score for field normalization of citations at the level of Web of Science subject categories. Scientometrics, 101(3), 1679–1693. https://doi.org/10.1007/s11192-014-1294-7
Access to login into the old portal (Manuscript Communicator) for Peer Review-

