PMID- 15961499 OWN - NLM STAT- In-Data-Review DA - 20050617 PUBM- Print IS - 1367-4803 VI - 21 Suppl 1 DP - 2005 Jun 1 TI - ExonHunter: a comprehensive approach to gene finding. PG - i57-i65 AB - MOTIVATION: We present ExonHunter, a new and comprehensive gene finding system that outperforms existing systems and features several new ideas and approaches. Our system combines numerous sources of information (genomic sequences, expressed sequence tags and protein databases of related species) into a gene finder based on a hidden Markov model in a novel and systematic way. In our framework, various sources of information are expressed as partial probabilistic statements about positions in the sequence and their annotation. We then combine these into the final prediction via a quadratic programming method, which we show to be an extension of existing methods. Allowing only partial statements is key to our transparent handling of missing information and coping with the heterogeneous character of individual sources of information. In addition, we give a new method for modeling the length distribution of intergenic regions in hidden Markov models. RESULTS: On a commonly used test set, ExonHunter performs significantly better than the existing gene finders ROSETTA, SLAM and TWINSCAN, with more than two-thirds of genes predicted completely correctly. AVAILABILITY: Supplementary material available at http://www.bioinformatics.uwaterloo.ca/supplements/05eh/ CONTACT: bbrejova@uwaterloo.ca. AD - School of Computer Science, University of Waterloo 200 University Avenue West, Waterloo, ON, Canada N2L 3G1. FAU - Brejova, Brona AU - Brejova B FAU - Brown, Daniel G AU - Brown DG FAU - Li, Ming AU - Li M FAU - Vinar, Tomas AU - Vinar T LA - eng PT - Journal Article PL - England TA - Bioinformatics JID - 9808944 SB - IM EDAT- 2005/06/18 09:00 MHDA- 2005/06/18 09:00 AID - 21/suppl_1/i57 [pii] AID - 10.1093/bioinformatics/bti1040 [doi] PST - ppublish SO - Bioinformatics 2005 Jun 1;21 Suppl 1:i57-i65. PMID- 15290755 OWN - NLM STAT- MEDLINE DA - 20040803 DCOM- 20040917 LR - 20041117 PUBM- Print IS - 0219-7200 VI - 1 IP - 4 DP - 2004 Jan TI - Optimal spaced seeds for homologous coding regions. PG - 595-610 AB - Optimal spaced seeds were developed as a method to increase sensitivity of local alignment programs similar to BLASTN. Such seeds have been used before in the program PatternHunter, and have given improved sensitivity and running time relative to BLASTN in genome-genome comparison. We study the problem of computing optimal spaced seeds for detecting homologous coding regions in unannotated genomic sequences. By using well-chosen seeds, we are able to improve the sensitivity of coding sequence alignment over that of TBLASTX, while keeping runtime comparable to BLASTN. We identify good seeds by first giving effective hidden Markov models of conservation in alignments of homologous coding regions. We give an efficient algorithm to compute the optimal spaced seed when conservation patterns are generated by these models. Our results offer the hope of improved gene finding due to fewer missed exons in DNA/DNA comparison, and more effective homology search in general, and may have applications outside of bioinformatics. AD - School of Computer Science, University of Waterloo, 200 University Ave West, Waterloo, ON N2L3G1, Canada. bbrejova@math.uwaterloo.ca FAU - Brejova, Brona AU - Brejova B FAU - Brown, Daniel G AU - Brown DG FAU - Vinar, Tomas AU - Vinar T LA - eng PT - Journal Article PL - England TA - J Bioinform Comput Biol JID - 101187344 RN - 0 (Proteins) RN - 9007-49-2 (DNA) SB - IM MH - Algorithms MH - Animals MH - Comparative Study MH - Computational Biology MH - DNA/genetics MH - Drosophila/genetics MH - Humans MH - Markov Chains MH - Mice MH - Models, Statistical MH - Proteins/genetics MH - Research Support, Non-U.S. Gov't MH - Sensitivity and Specificity MH - Sequence Alignment/*statistics & numerical data MH - *Software EDAT- 2004/08/04 05:00 MHDA- 2004/09/21 05:00 PHST- 2003/02/01 [received] PHST- 2003/06/24 [accepted] PHST- 2003/06/23 [revised] AID - S0219720004000326 [pii] PST - ppublish SO - J Bioinform Comput Biol 2004 Jan;1(4):595-610.