2025 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology, CIBCB
Abstract
Long non-coding RNAs have gained significant attention due to their crucial roles in the pathogenesis of complex human diseases, such as neurological diseases, cardiovascular diseases, AIDS, diabetes, and various types of cancer. In the machine learning literature, lncRNA-disease association (LDA) has been widely investigated as a binary classification problem, where each lncRNA-disease pair is seen as an independent instance. This approach presents drawbacks as it does not exploit the correlation among the diseases, aggravates the already imbalanced dataset, and substantially increases the execution time. Furthermore, the literature focuses on the transductive setting where new disease associations are predicted in lncRNAs already seen by the model, which naturally restricts its application to already seen lncRNAs. As a solution, we propose to address LDA prediction as a structured output prediction problem, namely (hierarchical) multi-label classification, where all LDAs are predicted at once for a given lncRNA. We compared several LDA methods and their structured output variants with recent (hierarchical) multi-label classification methods in an inductive setting, e.g., disease associations are predicted in unseen lncRNAs. Our experiments reveal that approaching LDA prediction with structured output prediction leads to superior or competitive results while drastically reducing the running time.