Transactions on Machine Intelligence

Transactions on Machine Intelligence

An Outlier Detection Approach To Highlight Effective Genes By A Deep Learning Model and An Adjusted Genetic Algorithm (DLAGA)

Document Type : Original Article

Authors
Department of Computer Engineering, Shiraz Branch, Islamic Azad University, Shiraz, Iran
Abstract
Identifying abnormally expressed genes is a critical step in cancer diagnosis and has attracted significant attention within the biomedical research community. Gene expression datasets typically involve high-dimensional data, which poses major challenges during the pre-processing stage, particularly in maintaining the biological relevance and interpretability of selected genes. Traditional gene selection techniques often struggle with high computational demands and fail to preserve the intrinsic biological meaning of genes. In this study, we present an effective two-phase framework for gene selection and classification tailored for cancer diagnosis. The first phase employs a Variational Autoencoder (VAE), a deep learning-based technique, to reduce data dimensionality while capturing essential gene expression patterns. In the second phase, we utilize an Adjusted Genetic Algorithm (AGA) to search for a subset of informative genes. To further enhance classification performance, we integrate a wrapper-based approach within the AGA to individually classify genes relevant to different cancer types. Our method was evaluated on two publicly available microarray datasets. The experimental results reveal that the proposed framework outperforms several existing approaches in terms of classification accuracy, while maintaining reasonable computational efficiency. The integration of VAE and AGA offers a robust and biologically interpretable approach to gene selection, making it a promising tool for advancing precision oncology. These findings underscore the potential of combining deep learning and evolutionary algorithms for effective biomarker discovery in high-dimensional genomic data.
Keywords

  • Al Shanbari, N., Alharthi, A., Bakry, S. M., Alzahrani, M., Alhijjy, M. M., Mirza, H. A., Almutairi, M., & Ekram, S. N. (2023). Knowledge of cancer genetics and the importance of genetic testing: A public health study. Cureus, 15(8), e43016. https://doi.org/10.7759/cureus.43016
  • Reda, B., Contardo, L., Prenassi, M., Guerra, E., Derchi, G., & Marceglia, S. (2023). Artificial intelligence to support early diagnosis of temporomandibular disorders: A preliminary case study. Journal of Oral Rehabilitation, 50(1), 31–38. https://doi.org/10.1111/joor.13383
  • Salvadores, M., & Supek, F. (2024). Cell cycle gene alterations associate with a redistribution of mutation risk across chromosomal domains in human cancers. Nature Cancer, 1–17.
  • Waarts, M. R., Stonestrom, A. J., Park, Y. C., & Levine, R. L. (2022). Targeting mutations in cancer. The Journal of Clinical Investigation, 132(8), e154943. https://doi.org/10.1172/JCI154943
  • Grisci, B. I., Feltes, B. C., de Faria Poloni, J., Narloch, P. H., & Dorn, M. (n.d.). The use of gene expression datasets in feature selection research: 20 years of inherent bias? Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, e1523.
  • Nematzadeh, H., García-Nieto, J., Aldana-Montes, J. F., & Navas-Delgado, I. (2024). Pattern recognition frequency-based feature selection with multi-objective discrete evolution strategy for high dimensional medical datasets. Expert Systems with Applications, 123521.
  • Potharlanka, J. L. (2024). Feature importance feedback with deep Q process in ensemble-based metaheuristic feature selection algorithms. Scientific Reports, 14(1), 2923.
  • Zhou, H., Wang, X., & Zhang, Y. (2024). Feature selection based on weighted conditional mutual information. Applied Computing and Informatics, 20(1–2), 55–68.
  • Ali, W., & Saeed, F. (2023). Hybrid filter and genetic algorithm-based feature selection for improving cancer classification in high-dimensional microarray data. Processes, 11(2), 562.
  • Zhao, T., Zheng, Y., & Wu, Z. (2023). Feature selection-based machine learning modeling for distributed model predictive control of nonlinear processes. Computers & Chemical Engineering, 169, 108074.
  • Hastie, T., Tibshirani, R., & Friedman, J. (2009). Model selection and regularization. In The elements of statistical learning: Data mining, inference, and prediction (2nd ed., pp. 219–259). Springer.
  • Xu, C., & Zhang, S. (2024). A genetic algorithm-based sequential instance selection framework for ensemble learning. Expert Systems with Applications, 236, 121269.
  • Janneh, L. L., Zhang, Y., Hydara, M., & Cui, Z. (2023). Deep learning-based hybrid feature selection for the semantic segmentation of crops and weeds. ICT Express. https://doi.org/10.1016/j.icte.2023.07.008
  • Wang, Z., Pei, C., Ma, M., Wang, X., Li, Z., Pei, D., ... & Xie, G. (2024). Revisiting VAE for unsupervised time series anomaly detection: A frequency perspective. arXiv preprint arXiv:2402.02820.
  • Radovic, M., Ghalwash, M., Filipovic, N., & Obradovic, Z. (2017). Minimum redundancy maximum relevance feature selection approach for temporal gene expression data. BMC Bioinformatics, 18(1), 1–14.
  • Bouazza, S. H., Auhmani, K., Zeroual, A., & Hamdi, N. (2018). Selecting significant marker genes from microarray data by filter approach for cancer diagnosis. Procedia Computer Science, 127, 300–309. https://doi.org/10.1016/j.procs.2018.01.126
  • Masoudi-Sobhanzadeh, Y., Motieghader, H., Omidi, Y., & Masoudi-Nejad, A. (2021). A machine learning method based on the genetic and world competitive contests algorithms for selecting genes or features in biological applications. Scientific Reports, 11(1). https://doi.org/10.1038/s41598-021-82796-y
  • Ghosh, M., Adhikary, S., Ghosh, K. K., Sardar, A., Begum, S., & Sarkar, R. (2018). Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods. Medical & Biological Engineering & Computing, 57(1), 159–176. https://doi.org/10.1007/s11517-018-1874-4
  • Ghosh, M., Begum, S., Sarkar, R., Chakraborty, D., & Maulik, U. (2019). Recursive memetic algorithm for gene selection in microarray data. Expert Systems with Applications, 116, 172–185. https://doi.org/10.1016/j.eswa.2018.06.057
  • Seyyedabbasi, A. (2023). Binary sand cat swarm optimization algorithm for wrapper feature selection on biological data. Biomimetics, 8(3), 310.
  • Guo, J., Jin, M., Chen, Y., & Liu, J. (2020). An embedded gene selection method using knockoffs optimizing neural network. BMC Bioinformatics, 21(1), 414. https://doi.org/10.1186/s12859-020-03717-w
  • Sahu, B., & Dash, S. (2024). Optimal feature selection from high-dimensional microarray dataset employing hybrid IG-Jaya model. Current Materials Science, 17(1), 21–43.
  • Yaqoob, A., Verma, N. K., & Aziz, R. M. (2024). Optimizing gene selection and cancer classification with hybrid sine cosine and cuckoo search algorithm. Journal of Medical Systems, 48(1), 10. https://doi.org/10.1007/s10916-023-02031-1
  • Babichev, S., Liakh, I., & Kalinina, I. (2024). Applying the deep learning techniques to solve classification tasks using gene expression data. IEEE Access, 12, 28437–28448. https://doi.org/10.1109/ACCESS.2024.3368070
  • Uzma, Al-Obeidat, F., Tubaishat, A., Shah, B., & Halim, Z. (2022). Gene encoder: A feature selection technique through unsupervised deep learning-based clustering for large gene expression data. Neural Computing and Applications, 34(11), 8309–8331. https://doi.org/10.1007/s00521-020-05101-4
  • Akhavan, M., & Hasheminejad, S. M. H. (2023). A two-phase gene selection method using anomaly detection and genetic algorithm for microarray data. Knowledge-Based Systems, 262, 110249.
  • Xie, J., Rao, J., Xie, J., Zhao, H., & Yang, Y. (2024). Predicting disease-gene associations through self-supervised mutual infomax graph convolution network. Computers in Biology and Medicine, 108048. https://doi.org/10.1016/j.compbiomed.2024.108048
  • Mai, S., Zheng, S., Yang, Y., & Hu, H. (2021). Communicative message passing for inductive relation reasoning. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 35, No. 5, pp. 4294–4302).
  • Xuan, P., Meng, X., Gao, L., Zhang, T., & Nakaguchi, T. (2022). Heterogeneous multi-scale neighbor topologies enhanced drug–disease association prediction. Briefings in Bioinformatics, 23(3), bbac123.
  • Peng, Z., Huang, W., Luo, M., Zheng, Q., Rong, Y., Xu, T., & Huang, J. (2020, April). Graph representation learning via graphical mutual information maximization. In Proceedings of the Web Conference 2020 (pp. 259–270).
  • Ino, K., Utagawa, Y., & Shiku, H. (2023). Microarray-based electrochemical biosensing.
  • Gouda, W., Tahir, S., Alanazi, S., Almufareh, M., & Alwakid, G. (2022). Unsupervised outlier detection in IoT using deep VAE. Sensors, 22(17), 6617. https://doi.org/10.3390/s22176617
  • Cai, Z., Yang, X., Zhou, M. C., Zhan, Z. H., & Gao, S. (2023). Toward explicit control between exploration and exploitation in evolutionary algorithms: A case study of differential evolution. Information Sciences, 649, 119656. https://doi.org/10.1016/j.ins.2023.119656
  • Wang, A., Liu, H., Yang, J., & Chen, G. (2022). Ensemble feature selection for stable biomarker identification and cancer classification from microarray expression data. Computers in Biology and Medicine, 142, 105208.
  • Jin, Z., Huang, Z., Wu, C., Zhang, F., Gao, Y., Guo, S., ... & Wu, J. (2024). Molecular insights into gastric cancer: The impact of TGFBR2 and hsa-mir-107 revealed by microarray sequencing and bioinformatics. Computers in Biology and Medicine, 108221.
  • Miwa, D., Shiraishi, T., Duy, V. N. L., Katsuoka, T., & Takeuchi, I. (2024). Statistical test for anomaly detections by variational auto-encoders. arXiv preprint arXiv:2402.03724.
  • Liu, M., Xu, L., Yi, J., & Huang, J. (2018). A feature gene selection method based on ReliefF and PSO. https://doi.org/10.1109/ICMTMA.2018.00079
  • Taşci, A., İnce, T., & Güzelış, C. (2017). A comparison of feature selection algorithms for cancer classification through gene expression data: Leukemia case.
  • Kr, K., Kv, A. R., & Pillai, A. (2019). An improved feature selection and classification of gene expression profile using SVM (Vol. 1). https://doi.org/10.1109/ICICICT46008.2019.8993358
  • National Center for Biotechnology Information. (n.d.). GEO DataSets. https://www.ncbi.nlm.nih.gov/gds/
  • (2024). Yassiaap/DLAGA [GitHub repository]. GitHub. https://github.com/Yassiaap/DLAGA.git
Volume 7, Issue 3
Spring 2024
Pages 223-237

  • Receive Date 01 July 2024
  • Revise Date 28 August 2024
  • Accept Date 23 September 2024