Öbek Bilgisayarlarda Paralel Fp-growth Gerçekleştirimi

Özdoğan, Gülistan Özdemir

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11851/336

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Abul, Osman	-
dc.contributor.author	Özdoğan, Gülistan Özdemir	-
dc.date.accessioned	2016-11-07 17:26:29	tr_TR
dc.date.available	2012-12-17	tr_TR
dc.date.issued	2010	-
dc.identifier.uri	https://hdl.handle.net/20.500.11851/336	-
dc.description.abstract	With the development of technology, there is an increase in size and complexity of data. The increase bring about new needs in the process of data processing. Data mining, aimed to satisfy these needs, is the process of acquiring useful information from large data warehouses. In data mining, the purpose is usually to find the frequent items which is called frequent itemset mining (FIM). Different algorithms are developed and studied for this purpose. Because of the limited physical capacity, serial algorithms aren’t sufficient when solving large data problems. So, paralel computing becomes important. With parallel algorithms, it is aimed to save on time and sources (like storage and memory). FP-Growth is a data mining algorithm that is developed to find frequent itemsets. FP-Growth is known with performing only two scans while reading database and using frequent pattern tree to find frequent items. Because of this data structure, it becomes more efficient than other frequent itemset mining algorithms. In the study, there are three different parallel approaches implemented in MPI, a library that is used to write parallel programs. The first basis thing for the approaches is whether there is a database on all of the nodes or not. The second thing is how task parallelism is done. For task, there are static and dynamical approaches. These are tested and analysed in two different clusters. According to the results, two of them are better than the other one which is sent database (tree) to the other processors. For both of them, the running time of the parallel algorithms is decreased by one forth by using two nodes. In addition to that, dynamical task parallelism is better than the static one in better algorithms.	en_US
dc.description.abstract	Teknolojinin gelişmesiyle verilerin miktarında ve çeşitliliğinde bir artış gözlemlenmiştir. Bu artış, veri işleme yöntemlerinde yeni ihtiyaçları gündeme getirmiştir. Veri madenciliği bu ihtiyaçlara cevap verebilmek amacıyla geliştirilen büyük veri ambarlarından yararlı bilgi elde etme sürecidir. Veri madenciliği kapsamında, çoklukla gereksinim duyulan ve araştırılan konulardan biri sık öge küme madenciliğidir. Sık öge küme madenciliği için farklı algoritmalar üretilmekte ve incelenmektedir. Ancak, fiziksel kapasitelerin sınırlı olmasından dolayı seri algoritmalar artan veri miktarını karşılamaya yetmemektedir. Bu durum paralel hesaplamayı gündeme getirir. Paralel algoritmalarla, zamandan ve kaynaktan (depolama, hafıza gibi) tasarruf edebilmek amaçlanır. FP-Growth, sık öge kümelerini bulmak için geliştirilen bir veri madenciliği algoritmasıdır. Literatürde FP-Growth, işlenecek veritabanını sadece iki kez tarama ve sık ögeleri bulurken kullandığı sık örüntü ağacı özelliği ile bilinir. Kullandığı bu ağaç yapısı ile daha verimli bir algoritma sunar. Bu çalışma içerisinde FP-Growth algoritmasına, paralel programlar yazmak için geliştirilen bir kütüphane olan MPI ile kodlanan üç farklı paralel yaklaşım önerilmektedir. Bu yaklaşımlarda temel alınan şey, paralel uygulama sırasında düğümlerin tümünde veritabanının mevcut olup olmadığıdır. Ek olarak, paralellik için görev dağılımı esas alındığından görev dağılımının statik ya da dinamik olması da diğer bir parametredir. Bu yaklaşımların iki farklı öbek bilgisayar üzerinde performans analizi gerçekleştirilmiştir. Çıkan sonuçlara göre, veritabanının mevcut olduğu iki yaklaşımın ağaç gönderimi esasına dayanan diğer yaklaşıma göre daha iyi sonuç verdiği gözlemlenmektedir. Her ikisi için de, paralel algoritma iki işlemciyle çalıştığı durumda bile çalışma zamanını dörtte bir oranında azaltmaktadır. Buna ek olarak, daha iyi sonuç veren iki algoritma içerisinde görev dağılımının dinamik olarak gerçekleştiği durum statik görev dağılımlı yaklaşımdan daha iyi sonuç vermektedir.	en_US
dc.language.iso	tr	en_US
dc.publisher	TOBB Ekonomi ve Teknoloji Üniversitesi - Fen Bilimleri Enstitüsü - Bilgisayar Mühendisliği Anabilim Dalı	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.source	TZ00092.pdf	tr_TR
dc.subject	Parallel computing	en_US
dc.subject	Frequent itemset mining	en_US
dc.subject	Data mining	en_US
dc.subject	FP-Growth	en_US
dc.subject	Paralel hesaplama	en_US
dc.subject	Sık öge küme madenciliği	en_US
dc.subject	Veri madenciliği	en_US
dc.title	Öbek Bilgisayarlarda Paralel Fp-growth Gerçekleştirimi	en_US
dc.title.alternative	Implementation of Parallel Fp-Growth Algorithm on Cluster Computers	en_US
dc.type	Master Thesis	en_US
dc.department	Faculties, Faculty of Engineering, Department of Computer Engineering	en_US
dc.department	Fakülteler, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü	en_US
dc.relation.publicationcategory	Tez	en_US
dc.identifier.scopusquality	N/A	-
dc.identifier.wosquality	N/A	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.fulltext	With Fulltext	-
item.grantfulltext	open	-
item.languageiso639-1	tr	-
item.openairetype	Master Thesis	-
item.cerifentitytype	Publications	-
Appears in Collections:	Bilgisayar Mühendisliği Yüksek Lisans Tezleri / Computer Engineering Master Theses