Baylor College of Medicine, Houston, TX 77030*
Genome sequencing projects generate long genomic sequences often containing several genes. The Baylor College of Medicine GeneFinder programs were developed further to predict multiple genes. We calculate new linear discriminant functions to predict 5'- , internal and 3'-exons for 4 different G+C compositional sequence groups. We introduce a list of rules a) for competition of overlapping exons and b) for joining neighbor exons. The dynamic programming algorithm was changed from searching acyclic graph of compatible exons to linear time algorithm which uses a knowledge of the maximal path for preceding exons for each of 3 possible open reading frames (in one DNA chain direction). New program FGENES reach about 93% accuracy (Sn=92% and Sp=94%) at the nucleotide level and 84% accuracy at the exact exon prediction level (Sn=84.7 and Sp=84) on Burset/Guigo set of 570 genes. Gene-level accuracy was 57%, that is more accurate than observed for the existing programs which do not use sequence homology information. Promoter and poly-A site prediction were embedded in the program, that helps to recognize multiple genes located in one sequence. TATA-box containing and TATA-less promoters prediction will be discussed.
* Current address: Sanger Centre, Hinxton, Cambridge CB10 1SA, UK