Bağlı veri üzerinde dağıtık sorgulama optimizasyonu

Özkan, Ethem Cem

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11851/2293

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Doğdu, Erdoğan	-
dc.contributor.author	Özkan, Ethem Cem	-
dc.date.accessioned	2019-12-25T10:09:40Z	-
dc.date.available	2019-12-25T10:09:40Z	-
dc.date.issued	2015
dc.identifier.citation	Özkan, E. (2015). Bağlı veri üzerinde dağıtık sorgulama optimizasyonu. Ankara: TOBB ETÜ Fen Bilimleri Enstitüsü. [Yayınlanmamış yüksek lisans tezi]	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.11851/2293	-
dc.identifier.uri	https://tez.yok.gov.tr/UlusalTezMerkezi/tezSorguSonucYeni.jsp	-
dc.description.abstract	SPARQL anlamsal ağın (semantik web) standart sorgulama dilidir ve büyük anlamsal ağ veri kaynakları olan "bağlı veri" kaynaklarını sorgulamada kullanılmaktadır. SPARQL dağıtık sorgular yazılarak, dağıtık bağlı veri kaynaklarını sorgulamak içinde kullanılır. Bu işlemde sorgu veya alt sorguları farklı veri kaynaklarında çalıştırılır ve sonuçlar sorgunun sonucu olarak birleştirilir. Bu tezde, "biricik yüklem veri kaynağı eleme" (unique predicate source pruning) (UPSP) adlı dağıtık SPARQL sorgusunda veri kaynağı seçen bir algoritma önerisi öneriyoruz. Algoritmanın amacı dağıtık SPARQL sorgusu çalıştırılmadan önce ilgili bağlı veri kaynaklarını bulmaktır. Bu sayede sorgu tüm veri kaynaklarına gönderilmek yerine, sorgu ile alakalı veri bulunduran dolayısı ile sorguya katkı sağlayabilecek veri kaynaklarına gönderilebilecektir. Önerdiğimiz algoritma, öncelikle sorgudaki yıldız, yol, alıcı ve hibrit adı verilen alt sorgu tiplerini eşleştirmektedir. Daha sonra sorgudaki tüm düğümler için özne-özne, özne-nesne, nesne-özne, nesne-nesne adı verilen uygun biricik yüklem tiplerini kontrol etmektedir. Eğer algoritma uygun biricik yüklem tipi ve alt sorgu tiplerini bulursa harici veri kaynaklarını elemektedir. UPSP algoritması, önceden çevrim dışı oluşturulmuş dizin yapısı kullanmaktadır. Bu dizin yapısı bu alanda daha önce yapılmış olan Hibiscus çalışması ile uyumlu olacak şekilde tasarlanmıştır. Hibiscus dizin yapısına her biricik yüklem tipi için bir tane olmak üzere dört adet isteğe bağlı alan eklenmiştir. UPSP algoritması, açık kaynak dağıtık sorgulama motoru olan Hibiscus üzerine gerçekleştirilmiştir. Algoritma, Hibiscus veri kaynağı eleme algoritmasından hemen önce çalışmaktadır. Algoritmanın performansı, FedBench test aracı kullanılarak orijinal Hibiscus veri kaynağı eleme yöntemi ile karşılaştırıldı. Sonuçlar algoritmanın veri kaynağı seçimini bazı durumlarda %20'ye kadar iyileştirdiğini göstermektedir.	tr_TR
dc.description.abstract	SPARQL is the standard query language of the semantic Web and it is used to query linked data sources which are big semantic Web data sources. SPARQL can also be used to query "distributed" linked data sources by writing federated SPARQL queries in which case query or its sub queries are executed in separate sites and the results are combined and returned as the result of the query. In this thesis, we propose a new algorithm called "unique predicate source pruning" (UPSP) that reduces the federated SPARQL query execution time. The idea behind the algorithm is to find all relevant distributed linked data sources before executing federated SPARQL queries. This way the query is not sent to all data sources but only to the linked data sources that have data relevant to the query and therefore might return results. UPSP algorithm checks the sub query patterns in the query being processed first, looks for "star", "path", "hybrid", "sink" patterns. For each node UPSS algorithm checks appropriate unique predicate types which are subject-subject, subject-object, object-subject and object-object. If UPSP algorithm finds appropriate unique predicate type for query pattern it prunes all external sources. UPSP algorithm uses an index structure that is built offline before the algorithm executes. UPSP algorithm index structure is designed to be compatible with Hibiscus index that was proposed in the literature before. UPSP algorithm index has four more optional fields which are for each unique predicate types. We implemented UPSP algorithm on Hibiscus federated query engine which is an open source federated SPARQL query engine. UPSS algorithm executes just before Hibiscus pruning algorithm. We evaluated UPSP using FedBench benchmark. We compared the performance of the algorithm against standard Hibiscus source selection. The results show that algorithm improves source pruning up to 20% in some cases.	en_US
dc.language.iso	tr	en_US
dc.publisher	TOBB University of Economics and Technology,Graduate School of Engineering and Science	en_US
dc.publisher	TOBB ETÜ Fen Bilimleri Enstitüsü	tr_TR
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	Unique predicate	en_US
dc.subject	Query optimization	en_US
dc.subject	Federated querying	en_US
dc.subject	Linked data	en_US
dc.subject	Biricik yüklem	tr_TR
dc.subject	Sorgu optimizasyonu	tr_TR
dc.subject	Dağıtık sorgulama	tr_TR
dc.subject	Bağlı veri	tr_TR
dc.title	Bağlı veri üzerinde dağıtık sorgulama optimizasyonu	en_US
dc.title.alternative	Federated query optimization on linked data	en_US
dc.type	Master Thesis	en_US
dc.department	Institutes, Graduate School of Engineering and Science, Computer Engineering Graduate Programs	en_US
dc.department	Enstitüler, Fen Bilimleri Enstitüsü, Bilgisayar Mühendisliği Ana Bilim Dalı	tr_TR
dc.relation.publicationcategory	Tez	en_US
item.fulltext	With Fulltext	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.languageiso639-1	tr	-
item.cerifentitytype	Publications	-
item.openairetype	Master Thesis	-
item.grantfulltext	open	-
Appears in Collections:	Bilgisayar Mühendisliği Yüksek Lisans Tezleri / Computer Engineering Master Theses