FIRE: Functional Inference using Rates of Evolution

Evolutionary Medicine Laboratory, Sydney Brenner Institute of Molecular Bioscience

University of the Witwatersrand, Johannesburg

 

FIRE version 2.0 (BLOSUM-FIRE) description

FIRE is a pairwise sequence alignment algorithm that uses the evolutionary rate (ω=dN/dS) at codon sites as an alignment metric. We hypothesise that sequences under similar selective pressures are responsible for similar functions. The algorithm uses evolutionary rates as a proxy for selective pressure to address the low similarity challenge. To increase the sensitivity of the algorithm, we have coupled the evolutionary rate based approach with a conventional BLOSUM substitution matrix based approach. These two have been combined in a dynamic scoring function which uses the selective pressure to score aligned residues. Therefore, this algorithm extends the traditional alignment technique of using substitution matrices to generate alignments.

To generate alignments, some preprocessing is required, two multiple sequence alignments (MSAs) of orthologous nucleotide sequences with their corresponding phylogenetic tree files to generate a pairwise amino acid alignment are required. MSA files and their corresponding phylogenetic tree files are used as input for the CODEML program found in the Phylogenetic Analysis by Maximum Likelihood (PAML) suite of software to produce the Bayes Empirical Bayes (BEB) maximum likelihod estimates (MLEs) of ω at codon sites. The CODEML rst output files are then used as input for the FIRE program to generate alignments. A modified Needleman-Wunsch algorithm determines the optimal global alignment generating output in conventional alignment file formats.

 

Download the FIRE program

  • The FIRE software consists of two Python(2.x)files: the FIRE program (fire.py) and matrix file (Blosum_matrix.py). Click here to download both files in gzip or zip format.

  • In addition to FIRE software we also provide the fire user information for download.

  • The old version of the program can be found here.

  • A raw CODEML rst output file before extraction of the dN/dS profiles for the Rubella virus endopeptidase domain can be found here.

  • Some data files used in our recent study Rubella virus endopeptidase domain and Hepatitis B X (HBx) protein

  • Sample data files: Kappa and Lambda antibody variable regions.

  • To determine the evolutionary rates we use the CODEML program in the PAML suite of software and its documentation can be found (external link) here.

  • To infer functions of novel proteins, we developed a custom database of evolutionary profiles for Pfam-A. The database EvoDB has been bundled into a general purpose resource and can been found here.

Citing FIRE

Durand PM, Hazelhurst S, Coetzer TL. Evolutionary rates at codon sites may be used to align sequences and infer protein domain function. BMC bioinformatics. 2010;11(1):151.

A new paper describing the new BLOSUM-FIRE algorithm is in review and once published will be made available for download.