{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# 01 - `UniProt` (browser)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Table of Contents\n", "\n", "1. [Introduction](#introduction)\n", "2. [The UniProt website](#uniprot)\n", " 1. [Using UniProtKB](#uniprotkb)\n", " 2. [Advanced Searches in UniProtKB](#uniprotkb_advanced)\n", "3. [Exercise 01](#ex01)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## Introduction\n", "\n", "

\n", "
\n", "UniProt is a comprehensive protein sequence and annotation resource, and is a consortium between the European Bioinformatics Institute (EMBL-EBI), the Swiss Institute of Bioinformatics (SIB), and the Protein Information Resource (PIR). UniProt provides unifies several legacy databases, uncluding Swiss-Prot, TrEMBL, iProClass and the PIR-PSD. \n", "
\n", "\n", "`UniProt` provides three key databases:\n", "\n", "* The UniProt Knowledgebase [(UniProtKB)](http://www.uniprot.org/help/uniprotkb)\n", "* The UniProt Reference Clusters [(UniRef)](http://www.uniprot.org/help/uniref)\n", "* The UniProt Archive [(UniParc)](http://www.uniprot.org/help/uniparc)\n", "\n", "`UniProtKB` is likely to be the database you use most frequently to find information on gene product/protein molecular function. It is the central hub of functional information on proteins, and collates functional annotations from many other databases, ontologies, and references. It keeps records of how annotations are derived (e.g. experimentally or computationally), with *evidence codes*, and is divided into two sections: one contains manually-annotated records (`UniProtKB/Swiss-Prot`), and a second contains computational annotations that are waiting for manual curation (`UniProtKB/TrEMBL`).\n", "\n", "`UniParc` is a comprehensive, non-redundant database that contains most of the publicly-available protein sequences from a range of sources.\n", "\n", "`UniRef` provides pre-clustered sets of sequences from `UniProtKB` and `UniParc`. A number of clusterings at different stringencies are provided.\n", "\n", "These databases can be queried in a number of ways, including:\n", "\n", "* At the `UniProt` website [http://www.uniprot.org/](http://www.uniprot.org/) in your web browser\n", "* Sending requests to the `UniProt` website, using a programming language\n", "\n", "### Resources\n", "* [`UniProt`](http://www.uniprot.org/)\n", "* [EMBL-EBI](http://www.ebi.ac.uk/)\n", "* [SIB](http://www.isb-sib.ch/)\n", "* [PIR](http://pir.georgetown.edu/)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## The `UniProt` website\n", "\n", "

\n", "
\n", " \n", "
\n", "\n", "![UniProt landing page](../assets/images/06-01_landing.png)\n", "\n", "The landing page offers options for each of the three main databases: UniProtKB, UniRef, and UniParc. It also offers sets of [complete proteomes for a range of organisms](http://www.uniprot.org/proteomes/), and databases of proteins organised by supporting data, such as [literature](http://www.uniprot.org/citations/), [taxonomic classification](http://www.uniprot.org/citations/), and [subcellular location](http://www.uniprot.org/locations/)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Using `UniProtKB`\n", "\n", "

\n", "
\n", " \n", "
\n", "\n", "![UniProtKB landing page](../assets/images/06-02_uniprotkb.png)\n", "\n", "

\n", "
\n", "QUESTIONS\n", "
    \n", "
  1. How many records are in UniProtKB today?\n", "
  2. How many of those records have been manually reviewed? What proportion of the total database has been manually reviewed?\n", "
  3. Which organisms are most highly represented in the database, today?\n", "
\n", "
\n", "\n", "

\n", "
\n", " \n", "
\n", "\n", "![UniProtKB Kitasatospora entries](../assets/images/06-03_kitasatospora.png)\n", "\n", "

\n", "
\n", "QUESTIONS\n", "
    \n", "
  1. How many entries are returned?\n", "
  2. How many of those entries have been manually reviewed? What proportion of the total is this?\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Filtering Results\n", "\n", "

\n", "
\n", " \n", "
\n", "\n", "

\n", "
\n", "QUESTIONS\n", "
    \n", "
  1. How many entries are returned?\n", "
  2. How many of those entries have been manually reviewed?\n", "
  3. How has the search term in the top bar changed? NOTE: these search term changes will be useful for querying UniProt programmatically.\n", "
\n", "
\n", "\n", "

\n", "
\n", " \n", "
\n", "\n", "![UniProtKB Kitasatospora reviewed entries](../assets/images/06-04_reviewed.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Inspecting an Entry\n", "\n", "

\n", "
\n", " \n", "
\n", "\n", "![UniProtKB reviewed entry header](../assets/images/06-05_evidence.png)\n", "\n", "

\n", "
\n", "QUESTIONS\n", "
    \n", "
  1. What kind of evidence is there for protein function?\n", "
  2. Kinetic information for this enzyme is drawn from which other database(s)?\n", "
  3. Are any protein structures available for this enzyme?\n", "
\n", "
\n", "\n", "

\n", "
\n", " \n", "
\n", "\n", "![UniProtKB history window](../assets/images/06-06_history_link.png)\n", "\n", "![UniProtKB history](../assets/images/06-07_history_list.png)\n", "\n", "

\n", "
\n", " \n", "
\n", "\n", "

\n", "
\n", "QUESTIONS\n", "
    \n", "
  1. When was the last change made to this record?\n", "
  2. What was the change?\n", "
\n", "
\n", "\n", "

\n", "
\n", " \n", "
\n", " \n", "

\n", "
\n", "QUESTIONS\n", "
    \n", "
  1. What kinds of changes do you see?\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Advanced Searches in UniProtKB\n", "\n", "

\n", "
\n", "At the top of the `UniProtKB` page you've probably noticed a drop-down button marked \"Advanced\". This lets you combine several search filters to conduct powerful searches, and hone in on the proteins most of interest to you in the UniProtKB database.\n", "
\n", "\n", "In this section, you'll use the advanced searches to identify candidate human proteins that are found in the nucleus, and have been associated with some disease activity or function.\n", "\n", "

\n", "
\n", " \n", "
\n", "\n", "![UniProtKB advanced search](../assets/images/06-08_advanced.png)\n", "\n", "

\n", "
\n", " \n", "
\n", " \n", "![UniProtKB advanced search - first terms](../assets/images/06-09_search_1.png)\n", "\n", "

\n", "
\n", " \n", "
\n", "\n", "![UniProtKB advanced search - second terms](../assets/images/06-10_search_2.png)\n", "\n", "

\n", "
\n", " \n", "
\n", "\n", "
\n", "QUESTIONS\n", "
    \n", "
  1. How many results do you see, today?\n", "
  2. What are the contents of the search bar? NOTE: this will be useful for programmatic queries, later. \n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Refine the search\n", "\n", "

\n", "
\n", " \n", "
\n", "\n", "![UniProtKB advanced search - second terms](../assets/images/06-11_search_3.png)\n", "\n", "

\n", "
\n", "QUESTIONS\n", "
    \n", "
  1. How many results do you see, now?\n", "
  2. What are the contents of the search bar? NOTE: this will be useful for programmatic queries, later. \n", "
\n", "
\n", "\n", "

\n", "
\n", " \n", "
\n", "\n", "![UniProtKB advanced search - second terms](../assets/images/06-12_search_4.png)\n", "\n", "

\n", "
\n", "QUESTIONS\n", "
    \n", "
  1. How many results do you see, now?\n", "
  2. How are these proteins associated with melanoma?\n", "
  3. What amino acid modifications have been found for these proteins?\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "### Downloading `UniProtKB` Search Results\n", "\n", "After the search above, you should be left with a small set of proteins that satisfy the following criteria:\n", "\n", "* They derive from *Homo sapiens*\n", "* They are annotated as being found in the nucleus (for which we allow any form of evidence)\n", "* They are associated with a disease term: \"melanoma\", and there is manually-curated experimental evidence for this association\n", "\n", "

\n", "
\n", "If we would like to download these records (or those from any other search), we have a number of options, which are obtained by clicking on the download button at the top of the search results.\n", "
\n", "\n", "![UniProtKB download button](../assets/images/06-13_download.png)\n", "\n", "* You can download all your search results, or just those selected with checkboxes\n", "* Results can be downloaded compressed (`gzip`ped) or as raw records\n", "* Results can be downloaded as:\n", " * sequence data (`FASTA`)\n", " * tabular form (`Excel`, `tab-separated`)\n", " * computer-readable (`XML`, `RDF`)\n", " \n", "![UniProtKB download button](../assets/images/06-14_download_options.png)\n", "\n", "

\n", "
\n", " \n", "
\n", "\n", "

\n", "
\n", "QUESTIONS\n", "
    \n", "
  1. How do the contents of these files differ\n", "
\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "\n", "### Exercise 01 (15min)\n", "\n", "Using the `UniProtKB` search tools, can you find and download sets of proteins that satisfy the following requirements:\n", "\n", "

\n", "
\n", " \n", "

\n", "

Set 1

\n", "\n", " \n", "

\n", "

Set 2

\n", "\n", "\n", "

\n", "

Set 3

\n", "\n", " \n", "

\n", "

Set 4

\n", "\n", "
\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.9" } }, "nbformat": 4, "nbformat_minor": 2 }