|
|
Interpreting your results:
1. Interpreting your results from RosettaDesign:
The energies, structure and sequence output by RosettaDesign are placed in
a pdb file.
The pdb file has the following sections:
1) coordinates of the design structure
2) a list of scores. Many of these scores are used in ab initio structure prediction and are not particularly
relevant to protein design. The scores used during design with the default protocols are:
total: the total score using the design energy function (lower is better)
LJatr: attractive portion of the lennard-jones potential (rewards close contacts)
LJrep: lennard-jones repulsive (penalizes overlaps)
LKsol: lazaridis-karplus solvation model (penalizes buried polars)
Erot: internal energy of side chain rotamers as derived from dunbrack's statistics
Eintra: intra-residue clashes
Epair: statistics based pair term, favors salt bridges
Eaa_phipsi: P(aa|phi,psi), ramachandran preferences
hb_sc: sidechain-sidechain and sidechain-backbone hydrogen bond energy
hb_srbb: backbone-backbone hbonds close in primary sequence
hb_lrbb: backbone-backbone hbonds distant in primary sequence
3) a table of energies for each residue in the protein: totals are at the
bottom
LJatr: lennard-jones attractive
LJrep: lennard-jones repulsive
LKsol: lazaridis-karplus solvation energy
Eh2o_sol: solvation using explicit water, in default mode this is not used
Eaa_phipsi: prob of an aa given phi and psi
Erot: rotamer internal energies
Eintra: internal clashes within residues
Ehbnd: total hydrogen bonding per residue
Epair: pair probabilities derived from the pdb database
Eref: reference energy for each amino acid
Egb: generalized born solvation energy, in default mode this is not used
Eh2o, Eh2o_hb: energies from explicit waters
Ecst: constraint energies
Eres: total for that residue (lower is better)
example of residue energy table:
res aa LJatr LJrep LKsol Eh2o_sol Eaa_phipsi Erot Eintra Ehbnd Epair Eref Egb Eh2o Eh2o_hb Ecst Eres
1 MET -4.0 0.3 1.4 0.0 0.0 2.5 0.3 -0.8 0.0 0.3 0.0 0.0 0.0 0.0 -0.7
2 GLN -2.5 0.1 1.5 0.0 0.0 2.9 0.0 -0.7 -0.1 1.0 0.0 0.0 0.0 0.0 0.3
3 ILE -4.1 0.1 1.2 0.0 -0.2 0.1 0.4 -1.6 0.0 -0.2 0.0 0.0 0.0 0.0 -3.9
4 PHE -4.4 0.5 1.8 0.0 -0.3 0.2 0.0 -1.6 0.0 -0.6 0.0 0.0 0.0 0.0 -3.2
5 THR -3.5 0.0 1.8 0.0 0.0 0.0 0.0 -1.4 0.0 0.3 0.0 0.0 0.0 0.0 -3.3
6 LYS -3.0 0.0 1.5 0.0 0.1 0.9 0.1 -1.7 0.0 0.6 0.0 0.0 0.0 0.0 -2.8
7 THR -3.1 0.1 1.9 0.0 -0.7 0.3 0.0 -1.2 0.0 0.3 0.0 0.0 0.0 0.0 -3.0
8 LEU -1.3 0.0 0.5 0.0 0.0 1.6 0.5 0.0 0.0 0.1 0.0 0.0 0.0 0.0 1.2
9 THR -1.4 0.1 1.1 0.0 -0.1 0.8 0.0 -0.2 0.0 0.3 0.0 0.0 0.0 0.0 0.1
10 GLY -0.8 0.1 0.6 0.0 -1.5 -0.0 0.0 -0.3 0.0 0.2 0.0 0.0 0.0 0.0 -2.2
11 LYS -2.8 0.1 2.4 0.0 0.0 4.4 0.2 -0.7 -0.3 0.6 0.0 0.0 0.0 0.0 2.6
4) a table of: measured energies - expected energies (expected energies are derived
by calculating the average energies of the different amino acids with a certain
number of neighbors in a large set of proteins in the pdb)
This table is useful for determining how well packed a residue is. The column Elj
compares the actual lennard jones energy of residue to the expected value. Well
packed residues should have Elj scores new zero or negative.
LJatr: lennard-jones attractive
LJrep: lennard-jones repulsive
LKsol: Lazaridis-Karplus solvation
Eaa_phipsi: P(aa|phi,psi)
Erot: rotamer preferences from dunbrack's library
Eintra: intra residue clashes
Ehbnd: hydrogen bonding
Epair: statistics based pair term
Elj: lennard-jones total
Eres: total per residue
SASApack: SASApack is related to the void volume in a protein. Surface areas are computed with
a 1.4 angstrom probe and 0.5 angstrom probe and the difference (ASA_0.5 - ASA_1.4) is compared
to the expected difference for a particular residue type in a particular environment. A negative
value is favorable and indicates that the residue is more tightly packed than is seen in average
pdb files.
example:
energies-average(in pdb) energies, AND rsd SASA packing score
res aa nb LJatr LJrep LKsol Eaa_phipsi Erot Eintra Ehbnd Epair Elj Eres SASApack
1 MET 15 -0.8 -0.1 0.0 0.0 -0.2 0.2 -0.1 0.0 -1.0 -1.7 2.38
2 GLN 11 0.2 -0.2 -0.3 0.1 0.3 0.0 -0.1 0.0 0.0 -0.4 8.93
3 ILE 22 0.3 -0.2 -0.3 0.0 -1.2 0.4 -0.6 0.0 0.1 -1.8 18.02
4 PHE 14 -0.3 0.1 0.4 -0.2 -1.0 -0.1 -1.0 0.0 -0.2 -2.5 1.89
5 THR 22 0.2 -0.3 -0.5 0.1 -0.5 0.0 -0.3 0.1 0.0 -1.0 11.62
6 LYS 15 0.3 -0.4 -0.5 0.1 -2.2 0.1 -0.9 0.2 -0.1 -3.6 5.48
7 THR 18 0.1 -0.2 -0.2 -0.6 -0.3 0.0 -0.2 0.1 -0.1 -1.1 10.38
8 LEU 10 0.9 -0.2 -0.6 0.1 0.1 0.5 0.5 0.0 0.8 0.9 1.33
5) a table of, total measured energies - expected energies, for residues in different
environments: surface, buried and exposed. When creating novel structures we have found it difficult
to get Elj numbers that are zero or negative for the buried residues.
example:
actual-average(in pdb) energies per residue
LJatr LJrep Elj
buried -0.1 -0.2 -0.3
middle -0.3 -0.1 -0.3
surfac 0.1 -0.1 0.0
6) a table of starting minus finishing chi angles, and absolute chi angles
7) phi, psi and omega angle for each residue
Tendencies:
RosettaDesign Tendencies
In some cases RosettaDesign does appear to make odd choices, and it helps
to know beforehand what some of these tendencies are. In these situations it
is probably best to use a resfile to try and point Rosetta away from these
pitfalls.
1) The program likes to put amino acids with similar chemical properties near each other.
This is primarily because polar residues can hydrogen bond with each other, and hydrophobics
can pack without burying hbonding groups. The result is that in some cases you may observe a
large cluster of hydrophobic residues on the surface of a protein, or a cluster of polars in
the core. In some cases this can be avoided by forcing key residues to be polar or hydrophobic.
2) Sometimes polar groups are buried without a hydrogen bonding partner. The
energy function has been parameterized to try and avoid this, but there is
no filter that prevents it.
File formats:
PDB file format description:
Please go to PDB Format Description for more details.
Record Format:
COLUMNS DATA TYPE FIELD DEFINITION
---------------------------------------------------------------------------------
1 - 6 Record name "ATOM "
7 - 11 Integer serial Atom serial number.
13 - 16 Atom name Atom name.
17 Character altLoc Alternate location indicator.
18 - 20 Residue name resName Residue name.
22 Character chainID Chain identifier.
23 - 26 Integer resSeq Residue sequence number.
27 AChar iCode Code for insertion of residues.
31 - 38 Real(8.3) x Orthogonal coordinates for X in
Angstroms.
39 - 46 Real(8.3) y Orthogonal coordinates for Y in
Angstroms.
47 - 54 Real(8.3) z Orthogonal coordinates for Z in
Angstroms.
55 - 60 Real(6.2) occupancy Occupancy.
61 - 66 Real(6.2) tempFactor Temperature factor.
73 - 76 LString(4) segID Segment identifier, left-justified.
77 - 78 LString(2) element Element symbol, right-justified.
79 - 80 LString(2) charge Charge on the atom.
2. Resfile format description:
This file specifies which residues will be varied
Column 2: Chain
Column 4-7: sequential residue number
Column 9-12: pdb residue number
Column 14-18: id (described below)
Column 20-40: amino acids to be used
NATAA => use native amino acid
ALLAA => all amino acids
NATRO => native amino acid and rotamer
PIKAA => select individual amino acids
POLAR => polar amino acids
APOLA => apolar amino acids
The following demo lines are in the proper format
A 1 3 NATAA
A 2 4 ALLAA
A 3 6 NATRO
A 4 7 NATAA
B 5 1 PIKAA DFLM
B 6 2 PIKAA HIL
B 7 3 POLAR
-------------------------------------------------
start
FAQs:
1.Could RosettaDesign calculate native protein free energy
only instead of doing a design?
Yes, RosettaDesign can calculate native free energy. The easiest thing to do is to submit a
job which doesn't vary the sequence ( using "create list" option and make all residues fixed ).
2. What is MAX_RES in the log file?
MAX_RES in the log file means that the total number of residues in your native protein. The current limitation of MAX_RES is 1000.
3. What is the difference between "MAX_RES" and "maximum residues can be varied"?
"MAX_RES" is the total number of residues in your native protein. The current limitation of MAX_RES is 1000.
"maximum residues can be varied" is the number of residues you choose to be redesigned.
4. What does "can't find starting residue in pdb file" mean in the log file?
"can't find starting residue in pdb file" usually means that your uploaded file's format is not a typical pdb file so that the server can
not find the start residue to redesign.
5. What is "unrecognized residue:"?
The server read the three letters representation of 20 types of amino acids identities from the pdb file. Sometimes, there are one or more
identities which are not included in the 20 amino acides shows in the pdb files so that the server can not recognized them.
6. What is "missing backbone atoms"?
"missing backbone atoms" means there are part of back bone atoms are missed in the pdb file so that the server can not get correct information
to redesign the backbone.
Papers:
1:
Design of a novel globular protein fold with atomic-level accuracy.
Kuhlman B, et al.
Science. 2003 Nov 21;302(5649):1364-8.
PMID: 14631033 [PubMed - in process]
2:
A large scale test of computational protein design: folding and stability
of nine completely redesigned globular proteins.
Dantas G, et al.
J Mol Biol. 2003 Sep 12;332(2):449-60.
PMID: 12948494 [PubMed - indexed for MEDLINE]
3:
Crystal structures and increased stabilization of the protein G variants
with switched folding pathways NuG1 and NuG2.
Nauli S, et al.
Protein Sci. 2002 Dec;11(12):2924-31.
PMID: 12441390 [PubMed - indexed for MEDLINE]
4:
Accurate computer-based design of a new backbone conformation in the
second turn of protein L.
Kuhlman B, et al.
J Mol Biol. 2002 Jan 18;315(3):471-7.
PMID: 11786026 [PubMed - indexed for MEDLINE]
5:
Conversion of monomeric protein L to an obligate dimer by computational
protein design.
Kuhlman B, et al.
Proc Natl Acad Sci U S A. 2001 Sep 11;98(19):10687-91. Epub 2001 Aug 28.
Erratum in: Proc Natl Acad Sci U S A 2002 May 28;99(11):7809.
PMID: 11526208 [PubMed - indexed for MEDL INE]
6:
Computer-based redesign of a protein folding pathway.
Nauli S, et al.
Nat Struct Biol. 2001 Jul;8(7):602-5.
PMID: 11427890 [PubMed - indexed for MEDLINE]
7:
Native protein sequences are close to optimal for their structures.
Kuhlman B, et al.
Proc Natl Acad Sci U S A. 2000 Sep 12;97(19):10383-8. Erratum in: Proc Natl
Acad Sci U S A. 2000 Nov 21;97(24):13460.
PMID: 10984534 [PubMed - indexed for MEDLINE]
|