Go Back

An Automatic Graded Partitioning of all Proteins

Michal Linial (1), Nathan Linial (2), Naftali Tishby(2) and Golan Yona(2)

(1) Department of Biological Chemistry, Institute of Life Sciences, Hebrew University. Jerusalem 91904, Israel.

(2) Institute of Computer Science, Hebrew University. Jerusalem 91904, Israel.

We investigate the space of all protein sequences. Based on currently known measures of similarity, we create for each sequence an exhaustive list of neighboring sequences. These lists induce a (weighted directed) graph whose vertices are the sequences. The weight of an edge connecting two sequences represents their degree of similarity. This graph encodes much of the fundamental properties of the sequence space. The idea that underlies our work is that interesting homologies among proteins can be deduced by transitivity.

If we eliminate all edges that do not pass a certain significance threshold, the graph splits into connected components. These automatically induced sets of proteins are closely correlated with natural biological families. By performing this procedure at varying thresholds, we obtain a hierarchical organization of the connected components, and thus of all known proteins.

The results show that this method successfully identifies many biological families. By varying the threshold of statistical significance, we discover finer sub-families that make up known families of proteins. This procedure also exposes specific linkage proteins or ancestor proteins. Many interesting relations between protein families were revealed and hierarchical organization within protein families is proposed.

An interactive web site including the results of our analysis is under construction. New sequences can be analyzed to get a full description of the corresponding components. It is also possible to investigate the effect of the sequence on the entire space.

Go Back