TOPS++FATCAT Home Database Searching References Help FATCAT Godzik Lab

AFP (Aligned Fragment Pair)

Given two protein structures, denote a match of two fragments (i.e., length = 8 aa), one from each protein as an Aligned Fragment Pair (AFP). Each AFP can define a transformation of two structures. The figure below shows two AFPs, which define two very different transformations of input structures.


As shown in the schematic example below, the green structure has to be rearranged (a twist introduced at the hinge, pointed by an arrow) so that the green and "#000066" structures can be better aligned (i.e., including 1-4 AFPs, instead of only two, either 1-2 or 3-4).

TOPS+ Models and comparison

TOPS+ is a comparison method that computes a distance between TOPS+ strings models of two proteins based on a dynamic programming approach and identifies the longest common subsequence (LCS), consisting of the list of the topologically equivalent SSEs between two proteins. For example, following figure shows the TOPS+ strings alignment between Dihydropteridine reductase proteins from rat (1dhr) and human (1hdr). The TOPS+ strings models for 1dhr and 1hdr are represented by a linear string-model, where a yellow triangle and red curves indicate the beta strands and alpha helices in their up or down orientations, respectively. The grey line and purple stubs represent the loop regions and the NAD ligand interactions, respectively. Note that the ligand-interaction information is optional and in this work we have not used it. The incoming and outgoing arcs are depicted in the SSEs (top and bottom of the beta strands), where red and green arcs represent the parallel and anti-parallel hydrogen-bond interactions that show beta-sheet information, while yellow and blue arcs indicate the right and left chirality relationships between the SSEs. A pink arrow between the TOPS+ strings elements indicates the conserved SSE. The dotted arrows indicate the conserved alpha helices and beta strands, while the plain arrows indicate the conserved loop regions.

(a) TOPS+ graph model, (b) TOPS+ strings model, and (c) TOPS+ strings matches between Dihydropteridine reductase from rat (1dhr) and human (1hdr). All the conserved TOPS+ strings elements are shown with pink arrows. Dotted arrows indicate matched helices and strands, plain arrows indicate matched loops, and arrows with double lines indicate matched ligand-interacting loops.

Schematic illustration of TOPS++FATCAT structural alignment

In this work, we want to test the general idea of pruning the search space of the FATCAT comparison process using topological constraints derived from the TOPS+ strings alignment. Many of the AFPs considered in the FATCAT alignment could be easily eliminated from the comparison by constraining the alignment region. Here we explore constraints obtained from the TOPS+ strings alignment, which identifies topologically equivalent secondary structure elements (alpha helices, beta strands, and loops) for this purpose. Such equivalences define blocks that restrict the alignment region; AFPs that fall outside these regions are simply not considered (see following figure(b)).

We introduce a parameter r to control the strictness of constraints by TOPS+ strings alignments; r equals 0 if the alignment region is strictly restrained by TOPS+ strings alignment, and r is set to 1 by default in our program to allow certain flexibility to the constrained alignment region (following figure(c)). We then can speed up the FATCAT alignment by considering only the AFPs within the constrained alignment area (following figure (d)). The rigid structural alignment can be treated as a special case of TOPS++FATCAT, in which no twist is allowed in chaining AFPs. However, the TOPS++FATCAT program provides alignment in both, rigid mode and flexible mode (default).

The schematic illustration of FATCAT structural alignment by chaining AFPs in a constrained alignment region defined by TOPS alignment output. (a) In FATCAT, two fragments form an AFP (shown as a line in the graph) according to the criteria (see above text). (b) The alignment of secondary structure elements from TOPS+ comparison is used to define the constrained area for AFP detection, in which each two aligned secondary structure elements defines an eligible block (shown as filled squares). These blocks may be disconnected, and we need to connect them with connecting blocks (shown as open squares). (c) We add a buffer area surrounding the constrained area defined in (b) (shown as the area closed by dashed lines) to get the constrained alignment region for FATCAT alignment (show as the area closed by dark lines). (d) Only those AFPs within the constrained alignment region are used in the dynamic programming algorithm for chaining.

TOPS++FATCAT (chaining) score (more detail)

In TOPS++FATCAT, Flexible structure alignment is formulated as an AFP chaining process (e.g., the path connected by blue dotted lines in the alignment graph below represents a possibe alignment) allowing at most t twists (t=5). Dynamic programming is used in the chaining process (as shown in the figure below). If we denote S(k) as the best score ending at AFP k, it can be calculated from the best ending at previous AFPs that can be connected with AFP k subject to the constraints of the consecutive,

where a(k) is the score of AFP k itself, determined by its RMSD (dk) and length (L) with long AFPs rewarded and large RMSDs penalized; is the score of introducing a connection between AFP m and AFP k, defined by a function of the compatibility of the AFPs and the mis-matched regions (p) and/or gaps (q) created by the connection of the two AFPs; T(k) is the number of twists requi"#000066" to connect the chain of AFPs leading up to S(k).

The TOPS++FATCAT (chaining) score is the best of all S(k) in the alignment graph.


P-value is used in TOPS++FATCAT to evaluate the significance of structural similarity detected by TOPS++FATCAT, the probability of observing a greater score. It was designed based on the observation that the TOPS++FATCAT similarity score between two unrelated structures follows the extreme value distribution. Briefly, TOPS++FATCAT similarity score incorporates the TOPS++FATCAT chaining score, RMSD of the resulting superposition, the number of equivalent positions in the alignment and the number of twists.

The TOPS++FATCAT similarity score is computed as

where cs is the TOPS++FATCAT chaining score; L is the number of equivalent positions in the alignment; RMSD is the overall RMSD between two structures when one structure is rearranged at the positions where twists are detected by TOPS++FATCAT; N is the number of blocks in the alignment (number of twists + 1).

P-value of s is then computed as

where the location and the scale parameter of the EVD of TOPS++FATCAT similarity scores of random structures were determined by empirical simulation.


The length of the alignment (including gaps)


The number of equivalent positions of the alignment
opt_len = align_len - gap


The root mean square deviation (RMSD) of aligned Cα atoms of the input structures, with one input structure rearranged if flexibility is detected (i.e., twists are introduced in the alignment)


The root mean square deviation (RMSD) of aligned Cα atoms of the input structures, without structural rearrangement even structural flexibility is detected in the alignment. So in the cases with flexibility (i.e., twists are introduced to get the alignment), the value of chain-rmsd could be artifically very high (because flexible alignment is longer than rigid alignment). Yet the comparison of chain-rmsd and opt-rmsd provides a way of showing how signifcantly the conformational flexibility introduced in comparing the structures improves the alignment.

TOPS++FATCAT Reference: Mallika Veeramalai, Yuzhen Ye and Adam Godzik. "TOPS++FATCAT: fast flexible structural alignment using constraints derived from TOPS+ Strings Model" BMC Bioinformatics ,9:358, 2008.

Contact Us | TOPS++FATCAT | FATCAT | Other Servers | Godzik Laboratory | The Burnham Institute

This research was supported by NIH grant P20 GM076221 JCMM (Joint Center for Molecular Modeling)