CAFASP criteria for evaluating 1D prediction accuracy
This page describes the criteria used for evaluating 1D structure predictions in context of CAFASP.
- Currently anticipated 'disciplines':
- secondary structure prediction
- solvent accessibility prediction
- Measuring accuracy:
- separation of targets:
proteins will be sorted into two categories: (1) those with significant sequence similarity to known structure; (2) those with no significant similarity to known structure. The two categories will be evaluated separately.
- secondary structure:
- per-residue accuracy (mainly Q3, others, see below)
- per-segment accuracy (SOV)
- if available: separate evaluation of turn-prediction
- solvent accessibility:
- per-residue correlation of relative accessibility
- two-state accuracy (Q2, buried/exposed)
- reliability indices:
plot of coverage vs. accuracy (one 'line' for all groups)
- Reporting accuracy on Web site:
- per-protein view:
- groups sorted alphabetically
- detailed all predictions given (à la JPRED)
- respective values of difference between accuracy for this protein and average for group
Note: the principle philosophy here is that levels of accuracy are meaningless when comparing predictions for ONE protein! Thus, anything that allows users to rank groups will be restricted to the overall-performance view (see next).
- overall-performance view (and ranking):
- Assumption: there are at least NMIN targets in the non-homology category that are predicted by ALL groups. (If some groups have not predicted ALL non-homology targets, we shall have sub-comparisons: (i) all groups that predicted more than NMIN, (ii) all groups that predicted any reasonable number of proteins, focused on common subset < NMIN.) Note: currently, we anticipate a value of NMIN = 15, which implies that Q3 differences below about two percentage points are NOT significant!
- For the per-protein view (see above), groups will be sorted alphabetically.
- For the overall-performance view, groups will be ranked in the following way. (1) The results will be clustered according to significant differences. (2) If N1 groups stand out given all criteria they will come on top (sorting amongst the N1 will be alphabetical). (3) If N2 groups are clearly performing less well, they will come at the bottom (sorted alphabetically). (4) All groups that can not be distinguished (given too few proteins, and/or too similar levels of accuracy will be marked as 'indistinguishable', and will be sorted in alphabetical order.
- Coverage vs. accuracy (per-residue accuracy) for all proteins predicted by all groups.
- Temporary web pages (this document):
http://cubic.bioc.columbia.edu/eva/doc/cafasp.html
- Contact:
Burkhard Rost (rost@columbia.edu)
- Team responsible for CAFASP 1D evaluation:
- Secondary structure:
- methods DSSP and STRIDE. Note: STRIDE values will be reported if and only if some groups stand out (positively) according to STRIDE but NOT according to DSSP. In contrast, if rankings appear similar for STRIDE and DSSP, we shall only report DSSP related values for accuracy.
- conversion of DSSP: [HG] -> helix, [BE] -> strand, all other to non-regular
- Solvent accessibility:
- Connolly surface according to DSSP (with standard water molecule size).
- projection of square Ångstrøm to relative accessibility according to Rost & Sander, Proteins, 1994, 19, 55-72.
- two-state levels of accuracy: residues with a solvent accessibility ¾ 16% will be considered buried, all others will be considered exposed. If the ranking resulting from placing the cut-off at 25% will differ considerably, these results will be reported separately.
- Separating proteins into similar to known structure and not similar
- cut-off: 25% identical residues over more than 80 residues
- Ratio: (i) well established in the literature (Sander & Schneider, Protein, 1991, 9, 56-68), (ii) very conservative: a recent analysis on some million pairs (Rost, Prot. Engng., 1999, 12, 85-94) showed that if all proteins found above this threshold are taken as true positives (similar structures), than about 90% of those predictions ARE WRONG!! At a cut-off of 23%, about 93% are wrong, at 27%, 70% are wrong, and at 33% non are. Thus, 25% is a fairly conservative saturation point.
- Comparisons of groups will focus entirely on the NON-similar proteins.
- Results for similar proteins will be reported if and only if it seems that secondary structure or solvent accessibility predictions in this regime do better than other techniques (homology modelling, threading) evaluated under the same criteria.
- Essential for final report of secondary structure prediction
- Q3 = overall three-state per-residue accuracy averaged over all proteins; additionally we shall monitor levels of accuracy for the information-index and Matthew correlations (all scores defined as in: Rost & Sander, JMB, 1993, 232, 584-599)
- SOV = overall three-state per-segment accuracy averaged over all proteins (Zemla et al., Proteins, 1999, 34, 220-223)
- standard deviations for Q3 and SOV
- reliability of prediction:
PLOT: %residues predicted / %residues predicted correctly this shows how
reliability indices help to locate more accurately predicted residues; the particular way of
doing this plot makes it completely independent of the scheme used for reporting
'reliability'. The plot gives an average over all predictions from a group, that is ONE line /
group -> should be easy to have ONE graph for ALL. If any group of methods will be
standing out, they will be reported separately.
- rank of prediction methods:
- FIRST of all: NO rank for things that cannot be distinguished since the difference in accuracy is below the statistical significance for the size of the set given. For details see above.
- sorted by quality, separated into classes of quality such that within each class sorting does NOT make sense, e.g. rule of thumb: say we have 25 proteins, and sigma is 10, then signficant differences are:
Example:
10/sqrt(25) = 10/5 = 2 percentage points
-> predictions with the following levels of accuracy
75,68,69,60
would be grouped as
group 1: 75
group 2: sorted alphabetically
group 3: 60
- ranking will be based on a variety of scores. Currently, we anticipate the following scenario:
- compile ranks separately for
per-residue scores: Q3, information, Matthew correlation
and one per-segment scores: SOV
- if all measures yield same ranking: report only Q3
- if ranks differ considerably between per-residue and per-segment scores: report both rankings
- if ranks differ between different per-residue scores (Q3, information, correlation): merge measures yielding differing ranks into one score, and rank again
Possibly important for final report of secondary structure prediction
- QH | QE = percentage of correctly predicted helices | strands (as percentage of observed AND predicted helices | strands; Rost & Sander, JMB, 1993, 232, 584-599)
- BAD = percentage of residues predicted in helix and observed in strand or predicted in strand and observed in helix (Defay & Cohen, Proteins, 1995, 23, 431-445)
- SovH || SovE: per-segment accuracy of predicting helix | strand (Zemla et al., Proteins, 1999, 34, 220-223)
- accuracy of predicting structural classes (all-alpha, all-beta, alpha/beta), classification will be slightly altered from the definitions of Zhang & Chou, Prot Sci., 1992, 1, 401-408 and Kneller, Cohen & Langridge, JMB, 1990, 214, 171-182. Note: the following table is to be read as an if alpha, elsif beta, elsif alpha-beta, else mix.
| class | protein length | percentage H | percentage E |
| all-alpha | > 60 | > 45% | < 5% |
| all-beta | > 60 | < 5% | > 45% |
| alpha-beta | > 60 | > 30% | > 20% |
| mixed | other | other | other |
For these classes we shall report the overall four-class accuracy.
- difference in numbers and average lengths of helices predicted and observed
- Q3 accuracy for the 50% most accurately predicted residues
Possibly more details for selected number of groups
If N groups stand out as the best ones (and if N is considerably smaller than the number of participating groups), then we shall also report the following overall table for each outstanding group:
- N = residues in helix, strand, other
obs= observed
prd= predicted
| | NHprd | NEprd | NLprd | SUMobs |
| NHobs | 22 | 2 | 3 | 25 |
| NEobs | 6 | 33 | 8 | 21 |
| NLobs | 9 | 10 | 0 | 55 |
| SUMprd | 37 | . | . | . |
Note: numbers don NOT sum in this example, but what it means is that 22 residues correctly predicted in helix; total observed in helix=25 total predicted in helix = 37 asf.
- observed: number of helices, strands, other
- observed: average length of helices, strands, other
- predicted: number of helices, strands, other
- predicted: average length of helices, strands, other
- information content & Matthews correlation indices
Essential for final report of solvent accessibility prediction
- correlation between predicted and observed relative solvent accessibility (first averaged over all residues in one protein, than average over all proteins reported; Rost & Sander, Proteins, 1994, 19, 55-72)
- Q2 = overall two-state per-residue accuracy averaged over all proteins (Rost & Sander, Proteins, 1994, 19, 55-72)
- standard deviations for correlation and Q2
- reliability of prediction: same procedure as for secondary structure
- ranking: same procedure as for secondary structure
Possibly important for final report of solvent accessibility prediction
- Qb | Qe = percentage of correctly predicted buried | exposed residues (as percentage of observed AND predicted buried | exposed residues; Rost & Sander, JMB, 1993, 232, 584-599); cut-off thresholds explored: (1) ¾ 16%, (2) ¾ 25% = buried.
- correlation | Q2 accuracy for the 50% most accurately predicted residues
- Sorting:
as overall-performance view, BUT: NO sort, since we run the risk that the best method is simply the one most recently developed!!
- Prediction:
detailed per-residue prediction for all methods
- Accuracy:
for each group (sorted alphabetically, since ranking makes NO sense here), we refrain from displaying ANY measure that may tempt users to rank methods for single proteins by reporting only deviations of accuracy from the group averages!
- Q3(this protein) - Q3(group average)
- SOV(this protein) - SOV(group average)
Additionally, the following may be reported (subject to current debate in consulting group).
- number of helices predicted / observed
- number of strands predicted / observed
- average length of helices predicted / observed
- average length of strands predicted / observed