Çakı, O.Karaçalı, B.03.05. Department of Electrical and Electronics Engineering03. Faculty of Engineering01. Izmir Institute of Technology2024-01-062024-01-0620221868-17431868-1751https://doi.org/10.1002/minf.202100118https://hdl.handle.net/11147/14200In-silico compound-protein interaction prediction addresses prioritization of drug candidates for experimental biochemical validation because the wet-lab experiments are time-consuming, laborious and costly. Most machine learning methods proposed to that end approach this problem with supervised learning strategies in which known interactions are labeled as positive and the rest are labeled as negative. However, treating all unknown interactions as negative instances may lead to inaccuracies in real practice since some of the unknown interactions are bound to be positive interactions waiting to be identified as such. In this study, we propose to address this problem using the Quasi-Supervised Learning (QSL) algorithm. In this framework, potential interactions are predicted by estimating the overlap between a true positive dataset of compound-protein pairs with known interactions and an unknown dataset of all the remaining compound-protein pairs. The potential interactions are then identified as those in the unknown dataset that overlap with the interacting pairs in the true positive dataset in terms of the associated similarity structure. We also address the class-imbalance problem by modifying the conventional cost function of the QSL algorithm. Experimental results on GPCR and Nuclear Receptor datasets show that the proposed method can identify actual interactions from all possible combinations. © 2021 Wiley-VCH GmbH.eninfo:eu-repo/semantics/closedAccessChemoinformaticsCompound SimilarityCompound-Protein InteractionsDrug DiscoveryMachine LearningCost functionsDrug interactionsLearning algorithmsLearning systemsSupervised learningChemoinformaticsCompound similarityCompound-protein interactionDrug discoveryIn-silicoInteraction predictionMachine-learningProtein interactionQuasi-supervised learningTrue positiveProteinsalpha 2C adrenergic receptoramcinonidebeta 1 adrenergic receptorbeta 2 adrenergic receptorbetaxololbisoprololcell nucleus receptorchenodeoxycholic acidchlorpromazinechlorpromazine derivativechlorpromazine hibenzatechlorpromazine phenolphthalinatecholesterolcicloprololclozapinedenopaminedipivefrinedopamine 2 receptordopamine 3 receptordydrogesteroneepinephrineeplerenoneestrogen receptor alphaethinylestradioletretinatefenoldopam mesilatefluoxymesteroneG protein coupled receptorhistamine H1 receptorhydrocortisoneisoetarineisotretinoinlevodopaloteprednol etabonatemethoxaminemetixenemetoprololmifepristonemuscarinic M3 receptornandrolone phenpropionatenorethisteroneoxandroloneoxymetazolineperphenazinepregnenoloneprogesterone receptorretinoic acid receptor betaretinoic acid receptor gammaretinoid X receptor alpharetinoid X receptor gammaritodrinesalbutamolsalbutamol sulfatespironolactonetazaroteneterbutaline sulfatetestosteroneunclassified drugproteinArticlechemical interactioncompound protein interactiongenome analysisintermethod comparisonkernel methodKolmogorov Smirnov testlearning algorithmmachine learningmolecular fingerprintingpairwise kernel methodpredictionprotein interactionquasi supervised learning algorithmsemi supervised machine learningalgorithmchemistryAlgorithmsMachine LearningProteinsQuasi-Supervised Strategies for Compound-Protein Interaction PredictionArticle2-s2.0-8511995434410.1002/minf.202100118