Derin Öğrenme ile Görüntülerde Kararlı Öznitelik Eşleme Tekniklerinin Geliştirilmesi

Aydoğdu, Muhammet Fatih

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11851/12449

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Demirci, Muhammed Fatih	-
dc.contributor.author	Aydoğdu, Muhammet Fatih	-
dc.date.accessioned	2025-04-11T19:53:24Z	-
dc.date.available	2025-04-11T19:53:24Z	-
dc.date.issued	2024	-
dc.identifier.uri	https://tez.yok.gov.tr/UlusalTezMerkezi/TezGoster?key=E_eEUHQic_C-LvhxNQn1W7b495AL6LHlVF2orOZ9QDL1G1wbcmNBDjaTCqiYKeHf	-
dc.identifier.uri	https://hdl.handle.net/20.500.11851/12449	-
dc.description.abstract	Bilgisayarlı görü teknikleri görüntüler üzerinde tespit edilen öznitelik noktalarından yaygın bir şekilde yararlanmaktadırlar. Bu öznitelik noktaları kullanılarak görüntü çiftleri arasında yararlı tutarlılıklar tespit edilebilir. Benzerlikler kullanılarak görüntü eşleştirme, nesne tanıma, görüntü dikişleme, görüntü mozaiği oluşturma ve nesne takibi gibi birçok uygulama için başarılı ürünler elde edilebilir. Görüntü çiftleri üzerinde tespit edilen öznitelik noktalarının eşleştirilmesi sırasında noktaların öznitelik uzayında birbirlerine göre uzaklıkları temel alınır. Öznitelik uzayında birbirlerine en yakın olan özniteliklerin yakınlıkları eğer yeterince özgünse bu öznitelikler varsayılan eşleşmeler olarak kabul edilir. Ancak bu varsayılan eşleşmeler çoğu zaman hatalı eşleşmeleri tümüyle saf dışı bırakamaz. Bunun için literatürde tekrarlamalı algoritmalardan faydalanarak en çok sayıda varsayılan eşleşmeyi içerecek şekilde bir geometrik tutarlılık elde edilmeye çalışılır. Elde edilen geometrik tutarlılık sayesinde varsayılan eşleşmelerde bulunan hatalı eşleşmeler elenir. Bu yöntemdeki temel sorun tüm görüntü çiftlerinde başarıya götürecek bir tekrarlama sayısının elde edilmesinin pratikteki imkansızlığıdır. Derin öğrenme yöntemlerinin literatürde birçok problemde alternatiflerine göre daha etkin sonuçlar elde etmesinden sonra çoğu bilgisayarlı görü ve görüntü işleme probleminde olduğu gibi öznitelik eşleştirme problemi için de derin öğrenme ile eğitilmiş yapay sinir ağları kullanan çözümler literatüre yerleşmişlerdir. Bu tezde, kararlı öznitelik eşleme yapan derin öğrenme ağları ile bağıl kamera pozisyonu tahmini problemine çözümler geliştirilmiştir. Öncelikli olarak literatürdeki çalışmaların hepsine temel oluşturan n-n'lik çatı incelenmiştir. n-n'lik bu çatının varsayılan eşlemelerdeki özniteliklerin koordinatlarından oluşan bir küme tipi girdi üzerinde çalıştığı gözlemlenmiştir. n-n'lik çatıya ait girdiyi işleyen literatürdeki çalışmaların varsayılan öznitelik eşleşmelerinde genel bağlam ve yerel bağlam çıkarırken karşılaştığı zorluklar incelenmiştir. n-n'lik çatıya alternatif olarak 1-1'lik alternatif bir çatı oluşturulmuştur. n-n'lik çatıda her bir yığın örneğinde tek bir görüntü çiftine ait veri bulunurken öne sürülen 1-1'lik çatıda her bir yığında sadece tek bir görüntü çiftine ait veri bulunmaktadır. 1-1'lik çatıdaki görüntü çiftine ait her bir varsayılan eşleşme için özel bir bağlam kanalının kullanılmasına imkan sağlanmaktadır. Dahası bu bağlam kanalındaki girdi satırların her bir varsayılan eşleşme için özel olarak sıralanabilmesi mümkündür. Tez kapsamında 1-1'lik çatı kullanan çok sayıda ve farklı tipte ağ katmanları içeren yapay sinir ağları oluşturulmuştur. Oluşturulan 1-1'lik yapay sinir ağları ve literatürdeki n-n'lik başarılı sinir ağları Tensor İşlem Birimleri üzerinde eğitilmişlerdir. Eğitimlerde mimarilere ait hesapsal grafiklerdeki parametreler güncellenirken birden fazla kayıp fonksiyonunun birleşiminden oluşan bileşke kayıp fonksiyonundan faydalanılmıştır. Başarım metriği olarak minimum ortalama hassasiyet ölçütü temel alınmıştır. Elde edilen sonuçlara göre 1-1'lik çatı için oluşturulan yapay sinir ağları literatürdeki n-n'lik yapay sinir ağlarının biri hariç tümünün başarımlarını \%30'a varan farklar ile geride bırakmıştır. Ayrıca tezin asıl konusu üzerinde çalışmaya başlamadan önce derin öğrenme kullanarak yapay sinir ağlarının eğitilmesine aşinalığı arttırmak için bir durum çalışması yapılmıştır. Bunun için portre fotoğrafları üzerinden yaş sınıfı tahmini yapan yapay sinir ağları geliştirilmiştir. Kullanılan veri setindeki portreler 6 sınıfa ayrılmıştır. Literatürdeki 6 katmanlı bir yapay sinir ağının 18 ve 34 katmanlı artık sinir ağlarına göre daha başarılı olduğu gözlemlenmiştir. Kullanılan artık sinir ağlarının 6 sınıflı yaş tahmini problemi için aşırı öğrenmeye sebep olacak kadar derin olduğu veya veri setinin yeterince zengin olmadığı sonucuna varılmıştır.	-
dc.description.abstract	Computer vision techniques widely utilize feature points detected in images. Using these feature points, useful consistencies can be detected between pairs of images. By leveraging these similarities, successful results can be achieved for various applications such as image matching, object recognition, image stitching, creating image mosaics, and object tracking. When matching feature points detected in pairs of images, the distances between the points in the feature space are used as the basis. If the proximities of the features closest to each other in the feature space are sufficiently distinctive, these features are considered putative matches. However, these putative matches often cannot entirely exclude incorrect matches. To address this, iterative algorithms are used in the literature to achieve geometric consistency that includes the maximum number of putative matches. Thanks to the geometric consistency obtained, incorrect matches in the putative matches are eliminated. The main issue with this method is the practical impossibility of finding a repetition number that guarantees success for all image pairs. Following the effective results achieved by deep learning methods over alternatives in many computer vision problems in the literature, solutions that use artificial neural networks trained using deep learning have become prevalent for the feature matching problem, as they have in most computer vision and image processing problems. In this thesis, solutions to the problem of relative camera pose estimation have been developed using deep learning networks that perform stable feature matching. First and foremost, the n-to-n framework, which forms the basis of all studies in the literature, was examined. This n-to-n framework was observed to operate on a set-type input consisting of the coordinates of features in putative matches. The challenges faced by studies in the literature using this framework in deriving global and local contexts from putative feature matches were analyzed. As an alternative to the n-to-n framework, a one-to-one framework was proposed. In the n-to-n framework, each batch contains data belonging to a single image pair, whereas, in the proposed one-to-one framework, each batch also includes data belonging to only a single image pair. The one-to-one framework allows the use of a dedicated context channel for each putative match of the image pair. Moreover, it is possible to specifically sort the input rows in this context channel for each putative match. Within the scope of the thesis, a variety of fundamental artificial neural networks with different architectures using the one-to-one framework were generated. The one-to-one artificial neural networks developed, along with the successful n-to-n ones in the literature, were trained on Tensor Processing Units (TPUs). Multiple loss functions were utilized during the training. The minimum average precision was used as the performance metric. According to the results, the artificial neural networks designed for the one-to-one framework outperformed all but one of the n-to-n neural networks from the literature by up to 30\%. Additionally, before working on the main topic of the thesis, a case study was conducted to increase familiarity with deep neural networks. For this purpose, artificial neural networks were developed to estimate age categories using portrait photographs. The portraits in the dataset were divided into six classes. It was observed that a 6-layer artificial neural network from the literature performed better than the 18 and 34-layer residual neural networks. It was concluded that the 18-layer and 34-layer residual networks might have been unnecessarily deep for the 6-class age estimation problem, leading to overfitting, or that the dataset lacked sufficient diversity to clearly demonstrate the advantages of deeper networks.	en_US
dc.language.iso	tr	-
dc.subject	Bilgisayar Mühendisliği Bilimleri-Bilgisayar ve Kontrol	-
dc.subject	Computer Engineering and Computer Science and Control	en_US
dc.title	Derin Öğrenme ile Görüntülerde Kararlı Öznitelik Eşleme Tekniklerinin Geliştirilmesi	-
dc.title	Development of Robust Feature Matching Techniques in Images Using Deep Learning	en_US
dc.type	Doctoral Thesis	en_US
dc.department	Fen Bilimleri Enstitüsü / Bilgisayar Mühendisliği Ana Bilim Dalı	-
dc.identifier.endpage	110	-
dc.identifier.yoktezid	916976	-
item.cerifentitytype	Publications	-
item.fulltext	No Fulltext	-
item.grantfulltext	none	-
item.languageiso639-1	tr	-
item.openairetype	Doctoral Thesis	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
Appears in Collections:	Bilgisayar Mühendisliği Doktora Tezleri / Computer Engineering PhD Theses

Show simple item record

CORE Recommender

Page view(s)

98

checked on Jul 7, 2025

Google Scholar^TM

Check

Page view(s)

Google ScholarTM

Google Scholar^TM