Improved Benchmarks for Computational Motif Discovery

Sandve, Geir Kjetil; Abul, Osman; Walseng, Vegard; Drablos, Finn

Please use this identifier to cite or link to this item: https://hdl.handle.net/20.500.11851/6874

Full metadata record

DC Field	Value	Language
dc.contributor.author	Sandve, Geir Kjetil	-
dc.contributor.author	Abul, Osman	-
dc.contributor.author	Walseng, Vegard	-
dc.contributor.author	Drablos, Finn	-
dc.date.accessioned	2021-09-11T15:44:01Z	-
dc.date.available	2021-09-11T15:44:01Z	-
dc.date.issued	2007	-
dc.identifier.issn	1471-2105	-
dc.identifier.uri	https://doi.org/10.1186/1471-2105-8-193	-
dc.identifier.uri	https://hdl.handle.net/20.500.11851/6874	-
dc.description.abstract	Background: An important step in annotation of sequenced genomes is the identification of transcription factor binding sites. More than a hundred different computational methods have been proposed, and it is difficult to make an informed choice. Therefore, robust assessment of motif discovery methods becomes important, both for validation of existing tools and for identification of promising directions for future research. Results: We use a machine learning perspective to analyze collections of transcription factors with known binding sites. Algorithms are presented for finding position weight matrices (PWMs), IUPAC-type motifs and mismatch motifs with optimal discrimination of binding sites from remaining sequence. We show that for many data sets in a recently proposed benchmark suite for motif discovery, none of the common motif models can accurately discriminate the binding sites from remaining sequence. This may obscure the distinction between the potential performance of the motif discovery tool itself versus the intrinsic complexity of the problem we are trying to solve. Synthetic data sets may avoid this problem, but we show on some previously proposed benchmarks that there may be a strong bias towards a presupposed motif model. We also propose a new approach to benchmark data set construction. This approach is based on collections of binding site fragments that are ranked according to the optimal level of discrimination achieved with our algorithms. This allows us to select subsets with specific properties. We present one benchmark suite with data sets that allow good discrimination between positive and negative instances with the common motif models. These data sets are suitable for evaluating algorithms for motif discovery that rely on these models. We present another benchmark suite where PWM, IUPAC and mismatch motif models are not able to discriminate reliably between positive and negative instances. This suite could be used for evaluating more powerful motif models. Conclusion: Our improved benchmark suites have been designed to differentiate between the performance of motif discovery algorithms and the power of motif models. We provide a web server where users can download our benchmark suites, submit predictions and visualize scores on the benchmarks.	en_US
dc.language.iso	en	en_US
dc.publisher	Bmc	en_US
dc.relation.ispartof	Bmc Bioinformatics	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.subject	[No Keywords]	en_US
dc.title	Improved Benchmarks for Computational Motif Discovery	en_US
dc.type	Article	en_US
dc.department	Faculties, Faculty of Engineering, Department of Computer Engineering	en_US
dc.department	Fakülteler, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü	en_US
dc.identifier.volume	8	en_US
dc.authorid	0000-0001-5794-828X	-
dc.authorid	0000-0002-4959-1409	-
dc.identifier.wos	WOS:000247791600002	-
dc.identifier.scopus	2-s2.0-34347339593	-
dc.institutionauthor	Abul, Osman	-
dc.identifier.pmid	17559676	-
dc.identifier.doi	10.1186/1471-2105-8-193	-
dc.relation.publicationcategory	Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı	en_US
dc.identifier.scopusquality	Q3	-
dc.identifier.wosquality	Q2	-
item.openairecristype	http://purl.org/coar/resource_type/c_18cf	-
item.fulltext	No Fulltext	-
item.grantfulltext	none	-
item.languageiso639-1	en	-
item.openairetype	Article	-
item.cerifentitytype	Publications	-
crisitem.author.dept	02.3. Department of Computer Engineering	-
Appears in Collections:	Bilgisayar Mühendisliği Bölümü / Department of Computer Engineering PubMed İndeksli Yayınlar Koleksiyonu / PubMed Indexed Publications Collection Scopus İndeksli Yayınlar Koleksiyonu / Scopus Indexed Publications Collection WoS İndeksli Yayınlar Koleksiyonu / WoS Indexed Publications Collection

Show simple item record

CORE Recommender

SCOPUS^TM
Citations

57

checked on Sep 13, 2025

WEB OF SCIENCE^TM
Citations

51

checked on Sep 6, 2025

Page view(s)

198

checked on Sep 15, 2025

Google Scholar^TM

Check

SCOPUSTM Citations

WEB OF SCIENCETM Citations

Page view(s)

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

WEB OF SCIENCE^TM
Citations

Google Scholar^TM