Genes are commonly identified in genomic based on the similarity of their sequences to known genes. While in many cases this is (relatively) straightforward, there are many cases where the similarity is so distant that even the assignment of a new gene to a protein family may be less than compelling. One approach to these difficult situations is to add information from an entire protein family to the analysis. Combinations of supervised and unsupervised learning, and fixed length and gapped comparisons provide powerful tools when working with distantly related sequences and sequence families. These tools can be applied to gene identification, family classification, and candidate gene analysis.