NICER (Nucleotide Identity Comparisons for Evolution and Recombination) is a program for automatic determination of a number of parameters that allow evaluation of the quality of inferred phylogenies (evolutionary trees). It was originally developed by Dr. Lester M. Shulman and Dr. Jaime Prilusky for evaluating whether or not a cluster of vaccine-derived polioviruses evolved as part of a single epidemiological event. Use of the NICER program is free, however we ask that if you use results form NICER in your publications you either reference this website or the following publication.

Analysis. NICER was designed to perform three types of analysis, (1) determination of the pattern of identical substations from a reference strain in 2 or more query sequences, (2) confirmation of genomic recombination between known progenitors in one or more query sequences by analysis of only those sites in which the two progenitor reference strains differ, and (3) determination of parameters that can be used to calculate evolutionary rates [e.g., the number of synonymous 3rd codon position nucleotide substitutions because such substitutions are not subjected to evolutionary pressures] or which characterize the types of polymerase substitution errors [ e.g., the ration of transitions to transversions].
NICER prepares multiple alignments of the reference(s) and query sequences for the 1st or 2nd type of analysis. If a single reference strain is uploaded, the 1st analysis will be run when the “Analyze” button is selected, whereas the 2nd analysis type will be run if two reference sequences are uploaded. The 3rd analysis is performed if the “Synonymous” button is activated. It uses the same alignment algorithm to prepare a pairwise alignment between the chosen query sequence and Reference 1. The user can specify the nucleotide positions for the “start” and “end” of the analysis within all alignments and for the 3rd type of analysis must indicate the reading frame for the ORF if the “start” nucleotide for the analysis is not the first nucleotide within its codon.

Access. NICER is invoked through a web browser . NICER is menue driven and currently accepts sequence data from individual sequence files located on the local computer, executes on the server at the INN, and returns the output to the web browser page from which it can be printed or saved.

Input files. The NICER web page requires entry of one or two reference sequences and allows entry of between one to five query sequences with an option to expand the number of query sequences. Expansion must be enabled before any entry of any sequence data. Reference and query input sequences, in Fasta format, are uploaded one at a time from individual plain text files (created with Notepad, TextWrangler, BBedit, etc) stored on your local computer to NICER by a browsing function enabled from the web page, [e.g., by clicking the ”Choose File” button]. Entries must have unique names. The name consists of all characters after the “>” until the first space. [We envision future versions where it will be possible for NICER to accept a file of sequences where the sequences appear one after the other or are pre-aligned and interleaved.]

Results. Currently output from NICER appears on a new web page on the same window and can be downloaded or printed.

Analysis Type 1. NICER prepares a multiple alignment of reference and query sequences and determines all loci where at least one of the query sequences differs from Reference 1. At each of those loci, NICER determines how many of the query sequences have substitutions and the highest number of those substitutions that are identical. Results of this analysis are presented in three ways. The lower portion contains an entry for each loci at which at least one of the query sequences contained a nucleotide substitution, Each entry lists the position relative to Reference 1, the highest number of query sequences with an identical substitution, the maximum number of identical changes per number of changed query sequences and the maximum number of identical changes per number of total query sequences and its %. The upper portion of the output is a Graphic Frequency Table that ranks the maximum number of identical changes per number of total query sequences from highest to lowest and indicates the number of occurrences of each. The middle section of the output is a Distribution Map that indicates the maximum % of identical changes per total number of query sequences at each loci where there is a substitution.

Interpretation.
Given “n” query sequences, the number and distribution of loci where “n” of “n” and “n-1” of “n” sequences have identical substitutions gives an indication whether the query sequences as a group evolved along a common pathway. The higher the frequency of loci with the highest number of identical substitutions, and the more dispersed they are along the segment analyzed, the more likely the sequences are evolutionarily and epidemiologically related. A discontinuous distribution may indicate genomic recombination. If there is a low frequency of high % of identical substitutions and a high frequency of mid and low % identity, then the query sequences are less likely to represent to a single epidemiological event.

Analysis Type 2. Nicer prepares a multiple alignment of two reference sequences and the query sequences determines all nucleotide loci at which the two reference strains differ, and then determines whether the corresponding nucleotide in the query sequence is identical to the first reference, the second reference, or neither reference. The output is presented in two ways. The upper part of the output is an alignment showing only loci where the references differed and in which all nucleotides in the query sequences that were identical to Reference 1 are represented by “1’s”, all positions identical to Referenced 2 are represented by “2’s”, and all positions different form both references are represented by “0’s”. The lower part represents the same alignment except instead of numbers, identity is indicated by colored pixels.

Interpretation.
A discontinuity, e.g., a consecutive cluster of different numbers or colors along the length of any of the query sequences is evidence consistent with genomic recombination. The recombination is between the reference strain genomes if consecutive loci before and after the discontinuity are identical with one and then the other reference strain. If a cluster in the query sequence is identical with one reference on one side of the discontinuity but different from both references on the other side, there is still evidence for a recombination, however the recombination most likely occurred with a genome different from that of the second reference. Finally, highly similar or identical discontinuity patterns for more than one query sequence strongly indicate that those query sequences share a similar evolutionary pathway because of the low frequency of recombination.

Analysis Type 3. NICER aligns a query sequence with Reference 1, assuming that the query sequence has evolved from Reference 1, and then counts the number of substitutions that are transitions (purine to purine or pyrimidine to pyrimidine) and the number that are transversions (purine to pyrimidine, or pyrimidine to purine). NICER also determines the number of 3rd codon position changes that do not encode an amino acid change. This part of the output can be ignored if the sequence is not part of an open reading frame (ORF). If the sequence represents an ORF, the output is based on the user indicated frame within the codon for the starting nucleotide of the sequence.

Interpretation.
The relative % of substitutions that are transitions and transversions is important to determine since this is a property of the nucleotide polymerase. Synonymous nucleotide substitutions are in general not subject to selective pressure (an exception might be a codon with a very low amount of corresponding tRNA). The rate of accumulation of such substitutions has been used to calculate evolutionary time. The order in which multiple substitutions occur within a codon cannot be determined, so such codons are excluded from this analysis.
Repeated empiric observations can establish the rate of accumulation of 3rd codon position, synonymous substitutions and the ratio of transitions to transversions for a given species. When the interval between sampling is known, a significantly higher or lower rate of synonymous substitutions and to a lesser extent an inconsistent transition to transversion ratio provide evidence against direct evolution of the query sequence from the reference.