Protein Etkileşim Tahmini için Pozitif Etiketsiz Öğrenme Algoritmalarının Geliştirilmesi

Pancaroğlu, Doruk

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11851/934

Title:	Protein Etkileşim Tahmini için Pozitif Etiketsiz Öğrenme Algoritmalarının Geliştirilmesi
Other Titles:	Improving Positive Unlabeled Learning Algorithms for Protein Interaction Prediction
Authors:	Pancaroğlu, Doruk
Advisors:	Tan, Mehmet
Keywords:	Computer Engineering and Computer Science and Control Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol
Publisher:	TOBB Ekonomi ve Teknoloji Üniversitesi Fen Bilimleri Enstitüsü
Source:	Pancaroğlu, D.(2014).Protein etkileşim tahmini için pozitif etiketsiz öğrenme algoritmalarının geliştirilmesi.Ankara:TOBB ETÜ Fen Bilimleri Enstitüsü.[Yayınlanmamış Yüksek Lisans Tezi]
Abstract:	In binary classification for protein interaction prediction, labeling two proteins as negative (not interacting) is a hard task. This problem is caused by the difficulty of obtaining two training samples that would never interact. Furthermore, the protein interaction databases do not include negative samples, even if the samples have been shown to be non-interacting. The aforementioned difficulty in obtaining true negative samples created a need for learning algorithms that does not use negative samples. This study aims to improve upon two well-performing positive unlabeled learning algorithms, AGPS and Roc-SVM for protein interaction prediction. Two extensions to these algorithms is proposed; the first one is to use Random Forests as the classifier instead of support vector Machines (AGPS-RF and Roc-RF) and the second is to combine the results of AGPS and Roc-SVM using a voting system (Hybrid Algorithm). After these two approaches are implemented, the results were compared to the original algorithms as well as two well-known learning algorithms, ARACNE and CLR. In the tests and comparisons, both Random Forest algorithms and the Hybrid algorithm performed well against the original SVM-classified ones. The improved Roc-RF and Hybrid Algorithms also performed well against ARACNE and CLR. Protein etkileşim tahmini için ikili sınıflandırmada, mevcut iki adet proteinin negatif (etkileşime girmeyen) olduğunu tespit edebilmek zor bir işlemdir. Bu zorluğun sebeplerinden biri bu sınıflandırmayı yapmaya yardımcı olacak eğitim kümesi için hiçbir zaman etkileşmeyen örnekleri temin etmenin güç olmasıdır. Ayrıca, bir protein çiftinin etkileşmediği ispatlanmış olsa bile, protein etkileşim veri tabanlarında bu negatif örneklere yer verilmez. Bu durum sebebiyle gerçek negatif örnek kullanmayan öğrenme algoritmalarına bir ihtiyaç doğmuştur. Bu çalışmada, yüksek performansları sebebiyle seçilen iki adet pozitif etiketsiz öğrenme algoritması, AGPS ve Roc-SVM için geliştirmeler yapılması hedeflenmiştir. Bu algoritmalara iki adet geliştirme yapılacaktır: algoritmaların sınıflandırma için kullandığı support vector Machines (SVM) sınıflandırıcısı yerine Random Forest sınıflandırıcısını kullanmak (AGPS-RF ve Roc-RF) ve iki algoritmayı birleştirerek sonuçlarını bir oylama sistemine sokmak (Karma Algoritma). Bu geliştirmeler yapıldıktan sonra algoritmalar önceki halleri ile ve yaygın olarak kullanılan iki adet sınıflandırma algoritması (CLR ve ARACNE) ile karşılaştırılarak performansları incelenmiştir. Yapılan karşılaştırmalarda, AGPS-RF, Roc-RF ve Karma Algoritma, SVM kullanan seleflerine göre daha iyi performans vermiştir. CLR ve ARACNE ile yapılan karşılaştırmalarda ise Roc-RF ve Karma Algoritma'nın daha performanslı olduğu görülmüştür.
URI:	https://tez.yok.gov.tr/UlusalTezMerkezi/tezSorguSonucYeni.jsp https://hdl.handle.net/20.500.11851/934
Appears in Collections:	Bilgisayar Mühendisliği Yüksek Lisans Tezleri / Computer Engineering Master Theses