The sequence-structure gap is BIG (107 million sequences vs 137K structures), so its quite likely that a specific protein of interest has no experimentally solved 3D structure. For such proteins it may be possible to predict the secondary and tertiary structure from the amino acids sequence. There are many different programs designed for structure prediction. In the 2016 CASP 12 (Critical assessement of protein structure prediction) competition (essentially a bake-off for fold prediction programs), more than 100 different programs were tested, all with varing degress of success.
In this practical we will make predictions for two small proteins which do not have a known (experimentally solved) 3D structure.
Uniprot ID | Protein Description | Organism |
---|---|---|
A6NFH5 | Fatty acid-binding protein | Human |
Q93VI0 | DNA-binding protein S1FA3 | Arabidopsis |
>sp|A6NFH5|Fatty acid-binding protein 12 OS=Homo sapiens
MIDQLQGTWKSISCENSEDYMKELGIGRASRKLGRLAKPTVTISTDGDVITIKTKSIFKN
NEISFKLGEEFEEITPGGHKTKSKVTLDKESLIQVQDWDGKETTITRKLVDGKMVVESTV
NSVICTRTYEKVSSNSVSNS
>sp|Q93VI0|DNA-binding protein S1FA3 OS=Arabidopsis thaliana
MAAEFDGKIESKGLNPGLIVLLVIGGLLLTFLVGNFILYTYAQKNLPPRKKKPVSKKKMK
KEKMKQGVQVPGE
This list provides a good summary of the many tools available for structure prediction. For this practical we will use tools that are relatively quick to give results. These might not be the 'best' tools but they will give you a feel for how such tools work.
In this practical we will
Jpred4 is a web-based protein secondary structure and solvent accessibility prediction server based on the JNet algorithm (that users a neural network) with a published three-state (helix,strand, coil) prediction accuracy of 82% Drozdetskiy et al., 2015
NetSurfP is a server that predicts the surface accessibility and secondary structure of an amino acid sequence using a neural network Petersen et al., 2009
Submit each of the sequences A6NFH5 and Q93VI0 to the Jpred and NetSurfP servers and compare the results.
Making secondary structure predictions gives us some understanding of the how the proteins might operate in 3D space. But to get a greater understanding of how sequence features influence protein function we need to know how these secondary structure elements fold together in 3D.
There are a number of software options for protein structure modelling. In this practical we will use SWISS-MODEL which is a fully automated protein structure homology-modelling server to make predictions for our two proteins.
Submit each of the above sequences Q93VI0 and A6NFH5 to SWISS-MODEL.
If you have more time, explore the use of some of the other prediction servers e.g.