Contents

  1. Short synopsis of category
    disciplines - accuracy - www-reports - URL - contact - responsible group -
  2. Detailed description

  3. 3D->1D - overall performance - per protein view


CAFASP criteria for evaluating 1D prediction accuracy

This page describes the criteria used for evaluating 1D structure predictions in context of CAFASP.


Short synopsis of category



Detailed criteria

Projecting 3D to 1D

Overall-performance view: prediction accuracy

  1. Essential for final report of secondary structure prediction
    1. Q3 = overall three-state per-residue accuracy averaged over all proteins; additionally we shall monitor levels of accuracy for the information-index and Matthew correlations (all scores defined as in: Rost & Sander, JMB, 1993, 232, 584-599)
    2. SOV = overall three-state per-segment accuracy averaged over all proteins (Zemla et al., Proteins, 1999, 34, 220-223)
    3. standard deviations for Q3 and SOV
    4. reliability of prediction:
      PLOT: %residues predicted / %residues predicted correctly this shows how reliability indices help to locate more accurately predicted residues; the particular way of doing this plot makes it completely independent of the scheme used for reporting 'reliability'. The plot gives an average over all predictions from a group, that is ONE line / group -> should be easy to have ONE graph for ALL. If any group of methods will be standing out, they will be reported separately.
    5. rank of prediction methods:
      • FIRST of all: NO rank for things that cannot be distinguished since the difference in accuracy is below the statistical significance for the size of the set given. For details see above.
      • sorted by quality, separated into classes of quality such that within each class sorting does NOT make sense, e.g. rule of thumb: say we have 25 proteins, and sigma is 10, then signficant differences are:
        Example:
        10/sqrt(25) = 10/5 = 2 percentage points
        -> predictions with the following levels of accuracy
        75,68,69,60
        would be grouped as
        group 1: 75
        group 2: sorted alphabetically
        group 3: 60
      • ranking will be based on a variety of scores. Currently, we anticipate the following scenario:
        1. compile ranks separately for
          per-residue scores: Q3, information, Matthew correlation
          and one per-segment scores: SOV
        2. if all measures yield same ranking: report only Q3
        3. if ranks differ considerably between per-residue and per-segment scores: report both rankings
        4. if ranks differ between different per-residue scores (Q3, information, correlation): merge measures yielding differing ranks into one score, and rank again
  2. Possibly important for final report of secondary structure prediction
    1. QH | QE = percentage of correctly predicted helices | strands (as percentage of observed AND predicted helices | strands; Rost & Sander, JMB, 1993, 232, 584-599)
    2. BAD = percentage of residues predicted in helix and observed in strand or predicted in strand and observed in helix (Defay & Cohen, Proteins, 1995, 23, 431-445)
    3. SovH || SovE: per-segment accuracy of predicting helix | strand (Zemla et al., Proteins, 1999, 34, 220-223)
    4. accuracy of predicting structural classes (all-alpha, all-beta, alpha/beta), classification will be slightly altered from the definitions of Zhang & Chou, Prot Sci., 1992, 1, 401-408 and Kneller, Cohen & Langridge, JMB, 1990, 214, 171-182. Note: the following table is to be read as an if alpha, elsif beta, elsif alpha-beta, else mix.
      class protein lengthpercentage Hpercentage E
      all-alpha > 60> 45%< 5%
      all-beta > 60< 5%> 45%
      alpha-beta> 60> 30%> 20%
      mixed otherotherother
      For these classes we shall report the overall four-class accuracy.
    5. difference in numbers and average lengths of helices predicted and observed
    6. Q3 accuracy for the 50% most accurately predicted residues
  3. Possibly more details for selected number of groups
    If N groups stand out as the best ones (and if N is considerably smaller than the number of participating groups), then we shall also report the following overall table for each outstanding group:
  4. Essential for final report of solvent accessibility prediction
  5. Possibly important for final report of solvent accessibility prediction

Per-protein view: prediction accuracy