01 - UniProt (browser)

Table of Contents

  1. Introduction
  2. The UniProt website
    1. Using UniProtKB
    2. Advanced Searches in UniProtKB
  3. Exercise 01

Introduction

UniProt is a comprehensive protein sequence and annotation resource, and is a consortium between the European Bioinformatics Institute (EMBL-EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). UniProt provides unifies several legacy databases, uncluding Swiss-Prot, TrEMBL, iProClass and the PIR-PSD.

UniProt provides three key databases:

UniProtKB is likely to be the database you use most frequently to find information on gene product/protein molecular function. It is the central hub of functional information on proteins, and collates functional annotations from many other databases, ontologies, and references. It keeps records of how annotations are derived (e.g. experimentally or computationally), with evidence codes, and is divided into two sections: one contains manually-annotated records (UniProtKB/Swiss-Prot), and a second contains computational annotations that are waiting for manual curation (UniProtKB/TrEMBL).

UniParc is a comprehensive, non-redundant database that contains most of the publicly-available protein sequences from a range of sources.

UniRef provides pre-clustered sets of sequences from UniProtKB and UniParc. A number of clusterings at different stringencies are provided.

These databases can be queried in a number of ways, including:

Resources

The UniProt website

UniProt landing page

The landing page offers options for each of the three main databases: UniProtKB, UniRef, and UniParc. It also offers sets of complete proteomes for a range of organisms, and databases of proteins organised by supporting data, such as literature, taxonomic classification, and subcellular location.

Using UniProtKB

  • Click on the UniProtKB link. This will take you to the UniProtKB front page, with a summary of entries, and a number of links.

UniProtKB landing page

QUESTIONS
  1. How many records are in UniProtKB today?
  2. How many of those records have been manually reviewed? What proportion of the total database has been manually reviewed?
  3. Which organisms are most highly represented in the database, today?

  • Enter the word Kitasatospora in the search bar at the top of the page, and click on the Search button.

UniProtKB Kitasatospora entries

QUESTIONS
  1. How many entries are returned?
  2. How many of those entries have been manually reviewed? What proportion of the total is this?

Filtering Results

  • On the left of the page there's an option to filter "kitasatospora" as an organism or by taxonomy. Click on the `organism` filter.

QUESTIONS
  1. How many entries are returned?
  2. How many of those entries have been manually reviewed?
  3. How has the search term in the top bar changed? NOTE: these search term changes will be useful for querying UniProt programmatically.

  • At the top left of the page there's an option to filter only the manually reviewed entries. Click on this filter.

UniProtKB Kitasatospora reviewed entries

Inspecting an Entry

  • Click on the link/accession for the entry with accession Q9AJE3.

UniProtKB reviewed entry header

QUESTIONS
  1. What kind of evidence is there for protein function?
  2. Kinetic information for this enzyme is drawn from which other database(s)?
  3. Are any protein structures available for this enzyme?

  • At the top of the page, there's a button marked History. Click on this button. A small window will open, with a link to Previous versions. Click on this link.

UniProtKB history window

UniProtKB history

  • Click on the Compare button.

QUESTIONS
  1. When was the last change made to this record?
  2. What was the change?

  • Compare some previous records to the current record (e.g. this comparison).

QUESTIONS
  1. What kinds of changes do you see?

Advanced Searches in UniProtKB

At the top of the `UniProtKB` page you've probably noticed a drop-down button marked "Advanced". This lets you combine several search filters to conduct powerful searches, and hone in on the proteins most of interest to you in the UniProtKB database.

In this section, you'll use the advanced searches to identify candidate human proteins that are found in the nucleus, and have been associated with some disease activity or function.

  • Click on the UniProt logo to return to the landing page
  • Click on the "Advanced" drop-down to get the advanced searching interface

UniProtKB advanced search

  • In the first field, select Organism [OS] with search term "Human". The dropdown will offer you several options as you type, but do not select them (you could have entered the organism "Homo sapiens" here, also).
  • In the next field, keep AND on the left, and select Subcellular location with search term "nucleus". The dropdown will offer several options but, again, do not select them. At this point, allow any assertion method for the evidence code.

UniProtKB advanced search - first terms

  • In the next field, keep AND on the left, and select Pathology & Biotech with class "Disease", and no search term. At this point, allow any assertion method for the evidence code.

UniProtKB advanced search - second terms

  • Click on the Search button.
QUESTIONS
  1. How many results do you see, today?
  2. What are the contents of the search bar? NOTE: this will be useful for programmatic queries, later.

  • Click on the "Advanced" drop down. You should see that the current search populates this dialogue box.
  • Change the Evidence option for the `Pathology & Biotech` part of the search to manual "Experimental" evidence
  • Click on the Search button

UniProtKB advanced search - second terms

QUESTIONS
  1. How many results do you see, now?
  2. What are the contents of the search bar? NOTE: this will be useful for programmatic queries, later.

  • Click on the "Advanced" drop down. You should see again that the current search populates this dialogue box.
  • Change the Term option for the Pathology & Biotech part of the search to "melanoma".
  • Click on the Search button.

UniProtKB advanced search - second terms

QUESTIONS
  1. How many results do you see, now?
  2. How are these proteins associated with melanoma?
  3. What amino acid modifications have been found for these proteins?

Downloading UniProtKB Search Results

After the search above, you should be left with a small set of proteins that satisfy the following criteria:

If we would like to download these records (or those from any other search), we have a number of options, which are obtained by clicking on the download button at the top of the search results.

UniProtKB download button

UniProtKB download button

  • Download the search results as tab-separated, text, and FASTA format files
  • Inspect the contents of these files

QUESTIONS
  1. How do the contents of these files differ

Exercise 01 (15min)

Using the UniProtKB search tools, can you find and download sets of proteins that satisfy the following requirements:



Set 1

  • Derives from Saccharomyces cerevisiae
  • Is associated with a membrane
  • Is a transcriptional regulator

Set 2

  • Has a function associated with alginate
  • Is annotated with any biotechnological application

Set 3

  • Any manual annotation associated with biofuel as a biotechnological application

Set 4

  • Derives from mouse
  • Is an enzyme
  • Has not been manually reviewed
  • Is mentioned in a Nature publication
  • Is between 100 and 300aa in length