Yeast Genetics, The Sanger Centre, Wellcome Trust, Genome Campus, Hinxton, Cambridge, CB10 1SA
The S. pombe genome project was initiated at the Sanger Centre in 1995. Approximately 66% of chromosome I has been sequenced with money provided by the Wellcome Trust. Subsequently the European Commission has funded the Sanger Centre together with 12 other European sequencing laboratories to continue sequencing of the S. pombe genome. The project is coordinated by the Sanger Centre and will be completed by the year 2000 [1].
At the outset of the S. pombe genome project, it was necessary to accumulate data from previous studies on this organism. A method was also required for the prediction of gene structure, as approximately 40% of previously analyzed genes were known to be spliced.
The ACEDB system has been chosen to support the informatics needs of this project. S. pombe molecular biology data including genetic maps, physical maps, references, protein sequences, nucleotide sequences, and gene information have been collated into a database named Pombase which is available from the Sanger Centre ftp site [2]. The genefinder tool within ACEDB has been configured to identify S. pombe protein coding regions.
Since the beginning of the project, a total of 2.9 Mb of fully annotated sequence data has been deposited in the EMBL database. Data from the 70 most recently analyzed chromosome I cosmids has revealed 914 predicted coding sequences. Of these 52% were known genes, or had homology to known genes. A further 23% had homology to genes of unknown function. 25% were unknown, and had no homology to any database entry, although this category includes a number of questionable ORF&rsquos with low coding potential.
Preliminary analysis has also revealed a gene density of one gene per 2.3 Kb. The size of the genome excluding rDNA is estimated to be 12.3-12.8 Mb, hence extrapolating from gene density we expect 5325-5541 genes in the S. pombe genome.
[1] http://www.sanger.ac.uk/Projects/S_pombe/