05 - But what if we have no structure, can 3D structure be predicted from sequence?

Introduction

The sequence-structure gap is BIG (107 million sequences vs 137K structures), so its quite likely that a specific protein of interest has no experimentally solved 3D structure. For such proteins it may be possible to predict the secondary and tertiary structure from the amino acids sequence. There are many different programs designed for structure prediction. In the 2016 CASP 12 (Critical assessement of protein structure prediction) competition (essentially a bake-off for fold prediction programs), more than 100 different programs were tested, all with varing degress of success.

In this practical we will make predictions for two small proteins which do not have a known (experimentally solved) 3D structure.

Uniprot ID Protein Description Organism
A6NFH5 Fatty acid-binding protein Human
Q93VI0 DNA-binding protein S1FA3 Arabidopsis

>sp|A6NFH5|Fatty acid-binding protein 12 OS=Homo sapiens
MIDQLQGTWKSISCENSEDYMKELGIGRASRKLGRLAKPTVTISTDGDVITIKTKSIFKN
NEISFKLGEEFEEITPGGHKTKSKVTLDKESLIQVQDWDGKETTITRKLVDGKMVVESTV
NSVICTRTYEKVSSNSVSNS


>sp|Q93VI0|DNA-binding protein S1FA3 OS=Arabidopsis thaliana 
MAAEFDGKIESKGLNPGLIVLLVIGGLLLTFLVGNFILYTYAQKNLPPRKKKPVSKKKMK
KEKMKQGVQVPGE

This list provides a good summary of the many tools available for structure prediction. For this practical we will use tools that are relatively quick to give results. These might not be the 'best' tools but they will give you a feel for how such tools work.

In this practical we will

  1. predict the secondary structure of the proteins using Jpred4 and NetSurfP
  2. predict the 3D structure of the protein using SWISS-MODEL.

Secondary structure prediction

  • Jpred4 is a web-based protein secondary structure and solvent accessibility prediction server based on the JNet algorithm (that users a neural network) with a published three-state (helix,strand, coil) prediction accuracy of 82% Drozdetskiy et al., 2015

  • NetSurfP is a server that predicts the surface accessibility and secondary structure of an amino acid sequence using a neural network Petersen et al., 2009

Exercise 01: [10 mins] Secondary structure prediction.

Submit each of the sequences A6NFH5 and Q93VI0 to the Jpred and NetSurfP servers and compare the results.

  • Did Jpred operate in the same way for both structures?
  • What secondary structure elements are predicted for A6NFH5?
  • Are the predictions from each server the same for A6NFH5?
  • What secondary structure elements are predicted for Q93VI0?
  • Are the predictions from each server the same for Q93VI0?

Tertiary structure prediction

Making secondary structure predictions gives us some understanding of the how the proteins might operate in 3D space. But to get a greater understanding of how sequence features influence protein function we need to know how these secondary structure elements fold together in 3D.

There are a number of software options for protein structure modelling. In this practical we will use SWISS-MODEL which is a fully automated protein structure homology-modelling server to make predictions for our two proteins.

Exercise 02: [15 mins] Tertiary structure prediction.

Submit each of the above sequences Q93VI0 and A6NFH5 to SWISS-MODEL.

  • What template is used to model Q93VI0 and what is the sequence similarity?
  • What template is used to model A6NFH5 and what is the sequence similarity?
  • What 3D folds are predicted for each of the sequences?
  • How do you evaluate the accuracy of the models you have built?

If you have more time, explore the use of some of the other prediction servers e.g.

  • PSIPRED: which provides a protein sequence analysis workbench, including secondary structure predictions and fold recognition.

  • Phyre2: a server for protein modelling and prediction.