Khepera , P.O. Box 917, Hanover, NH 03755, U.S.A.
This paper explores how nucleotide complementarity encodes information about protein structure by examining the relationship between amino acid pairing indices and codon complementarity relationships. A framework for understanding sequence-structure relationships through the genetic code is described. The four ways of reading the information in a strand of DNA [sense strand: ID = 5'->3'; RID = 3'->5'; anti-sense strand: WC 5'->3';RWC = 3'->5'] are used to classify complementarity relationships between pairs of codons. Amino acids are shown to have poor pairing potentials (PPs) for amino acids encoded by some types of complementarity relationships, and good PPs for others. An amino acid can have several complementarity pairs within each classification. The PPs associated with these families tend to be qualitatively similar such that their average pair potentials (&lsaquoPP&rsaquo) fall into predictable classes. The relationship between the four decodings are symmetrical with respect to the neutral pairing potentials so that PPID » - PPWC and PPRID » - PPRWC. The symmetry of these equations is broken by charged amino acids or amino acids with special roles in protein structure formation. In coding sequences for sperm whale myoglobin, hen lysozyme and cow gamma crystallin, the patterns in 2D graphical matrices of codon complementarity reveal information about the encoded protein structures. The topological complexity of protein-protein interactions makes interpretation of these 2D plots more complicated than widely used matrix comparison graphs of DNA, RNA and protein sequences; or 2D contact maps derived from coordinates of protein structures. Some approaches to analyzing the structural information in these graphical displays are presented and potential applications in gene identification and structure prediction are discussed.