09 - Programming for KEGG

Introduction

The KEGG browser interface, while able to integrate searches across comprehensive and quite disparate datasets, does not always present the most convenient interface to extract that information (such as downloading FASTA sequences for an entry). As with all browser-based interfaces, it can also be tedious and time-consuming to point-and-click your way through a large number of searches.

This notebook presents examples of methods for using `KEGG` programmatically, via the Biopython programming libraries, and you will be controlling the searches using Python code in this notebook.

As with all programmatic searches, there are a number of advantages to an automated approach:

  • It is easy to set up repeatable searches for many sequences, or collections of sequences
  • It is easy to read in the search results and conduct downstream analyses that add value to your search

Where it is not practical to submit a large number of simultaneous queries via a web form (because it is tiresome to point-and-click over and over again), this can be handled programmatically instead. You have the opportunity to change custom options to help refine your query, compared to the website interface. If you need to repeat a query, it can be trivial to get the same settings every time, if you use a programmatic approach.

The Biopython interface to KEGG has several other advantages that we will not cover in this lesson, in that it allows for a much greater range of image manipulations for the pathway maps that KEGG returns.

The `KEGG` interface is not as well documented as some other resources (such as `NCBI` or `Ensembl`), and `KEGG` does not provide any usage guidelines. To avoid risking overloading the service, Biopython restricts us to three calls per second.

Be warned also that the conditions of service include:

"This service should not be used for bulk data downloads".

Python imports

In [1]:
# Show plots as part of the notebook
%matplotlib inline

# Show images inline
from IPython.display import Image

# Standard library packages
import io
import os

# Import Biopython modules to interact with KEGG
from Bio import SeqIO
from Bio.KEGG import REST
from Bio.KEGG.KGML import KGML_parser
from Bio.Graphics.KGML_vis import KGMLCanvas

# Import Pandas, so we can use dataframes
import pandas as pd

Python functions

In the cell below, we define a couple of useful functions that convert some returned output into Pandas dataframe form, and display .pdf images directly in the notebook.

You do not need to understand these to follow the lesson.

In [2]:
# A bit of code that will help us display the PDF output
def PDF(filename):
    return HTML('<iframe src=%s width=700 height=350></iframe>' % filename)

# Some code to return a Pandas dataframe, given tabular text
def to_df(result):
    return pd.read_table(io.StringIO(result), header=None)

Running a remote KEGG query

There is typically only a single step involved in obtaining a result from `KEGG` with Biopython:
  • run one of the functions provided by Bio.KEGG.REST, and catch the result in a variable.

The available functions are:

  • kegg_conv() - convert identifiers from KEGG to those for other databases
  • kegg_find() - find KEGG entries with matching query data
  • kegg_get() - retrieve data for a specific entry from KEGG
  • kegg_info() - get information about a KEGG database
  • kegg_link() - find entries in KEGG using a database cross-reference
  • kegg_list() - list entries in a a database

The generic form of using these functions to get information from KEGG and place the output in the variable myvar is:

myvar = REST.<function>(<query>, <arg1>, <arg2>, `...`).read()

where <function> is one of the functions above, <query> is a string containing yoru query for KEGG, and <arg1>, <arg2> and so on are arguments that may be required for some of the functions.

You will use some of these functions in the notebook cells below to get information from KEGG.

kegg_info()

This function returns basic information about a specified `KEGG` database - much like visiting the landing page for that database.

For instance, to get information about the KEGG databases as a whole, you can use kegg_info("kegg") to get a handle from KEGG describing the databases, and catch it in a variable:

result = REST.kegg_info("kegg").read()

We could convert this handle to a Pandas dataframe with the function defined above: to_df():

to_df(result)

Not all data is suited to `pandas` dataframe representation

or .read() the handle, and print it to output directly with the print() statement:

print(result)
In [3]:
# Perform the query
result = REST.kegg_info("kegg").read()

# Print the result
print(result)

# Convert result to dataframe
# NOTE: kegg_info() requests do not produce a suitable data format for dataframe representation
#to_df(result)
kegg             Kyoto Encyclopedia of Genes and Genomes
kegg             Release 85.0+/03-04, Mar 18
                 Kanehisa Laboratories
                 pathway     567,969 entries
                 brite       203,224 entries
                 module      461,504 entries
                 orthology    21,840 entries
                 genome        5,616 entries
                 genes     25,801,250 entries
                 compound     18,257 entries
                 glycan       11,015 entries
                 reaction     10,828 entries
                 rclass        3,108 entries
                 enzyme        7,146 entries
                 disease       2,035 entries
                 drug         10,486 entries
                 dgroup        2,048 entries
                 environ         856 entries
                 network         296 entries
                 variant         123 entries

This gives us a similar overview to the available resources as the KEGG landing page. However, the kegg_info() function is a little more powerful, as it can find information about specific databases:

In [4]:
# Print information about the PATHWAY database
result = REST.kegg_info("pathway").read()
print(result)
pathway          KEGG Pathway Database
path             Release 85.0+/03-04, Mar 18
                 Kanehisa Laboratories
                 567,969 entries

linked db        module
                 ko
                 genome
                 <org>
                 compound
                 glycan
                 reaction
                 rclass
                 enzyme
                 network
                 disease
                 drug
                 pubmed

and even about specific organisms (identified with their three-letter code):

In [5]:
# Print information about Kitasatospora setae
result = REST.kegg_info("ksk").read()
print(result)
T01648           Kitasatospora setae KEGG Genes Database
ksk              Release 85.0+/03-04, Mar 18
                 Kanehisa Laboratories
                 7,673 entries

linked db        pathway
                 brite
                 module
                 ko
                 genome
                 enzyme
                 ncbi-proteinid
                 uniprot

kegg_list()

The `kegg_list()` function returns a table of entry identifiers and definitions for a specified database.

For example, to list all the entries in the PATHWAY database, you could use:

In [6]:
# Get all entries in the PATHWAY database as a dataframe
result = REST.kegg_list("pathway").read()
to_df(result)
Out[6]:
0 1
0 path:map00010 Glycolysis / Gluconeogenesis
1 path:map00020 Citrate cycle (TCA cycle)
2 path:map00030 Pentose phosphate pathway
3 path:map00040 Pentose and glucuronate interconversions
4 path:map00051 Fructose and mannose metabolism
5 path:map00052 Galactose metabolism
6 path:map00053 Ascorbate and aldarate metabolism
7 path:map00061 Fatty acid biosynthesis
8 path:map00062 Fatty acid elongation
9 path:map00071 Fatty acid degradation
10 path:map00072 Synthesis and degradation of ketone bodies
11 path:map00073 Cutin, suberine and wax biosynthesis
12 path:map00100 Steroid biosynthesis
13 path:map00120 Primary bile acid biosynthesis
14 path:map00121 Secondary bile acid biosynthesis
15 path:map00130 Ubiquinone and other terpenoid-quinone biosynt...
16 path:map00140 Steroid hormone biosynthesis
17 path:map00190 Oxidative phosphorylation
18 path:map00195 Photosynthesis
19 path:map00196 Photosynthesis - antenna proteins
20 path:map00220 Arginine biosynthesis
21 path:map00230 Purine metabolism
22 path:map00231 Puromycin biosynthesis
23 path:map00232 Caffeine metabolism
24 path:map00240 Pyrimidine metabolism
25 path:map00250 Alanine, aspartate and glutamate metabolism
26 path:map00253 Tetracycline biosynthesis
27 path:map00254 Aflatoxin biosynthesis
28 path:map00260 Glycine, serine and threonine metabolism
29 path:map00261 Monobactam biosynthesis
... ... ...
493 path:map07057 Antiparkinsonian agents
494 path:map07110 Benzoic acid family
495 path:map07112 1,2-Diphenyl substitution family
496 path:map07114 Naphthalene family
497 path:map07117 Benzodiazepine family
498 path:map07211 Serotonin receptor agonists/antagonists
499 path:map07212 Histamine H1 receptor antagonists
500 path:map07213 Dopamine receptor agonists/antagonists
501 path:map07214 beta-Adrenergic receptor agonists/antagonists
502 path:map07215 alpha-Adrenergic receptor agonists/antagonists
503 path:map07216 Catecholamine transferase inhibitors
504 path:map07217 Renin-angiotensin system inhibitors
505 path:map07218 HIV protease inhibitors
506 path:map07219 Cyclooxygenase inhibitors
507 path:map07220 Cholinergic and anticholinergic drugs
508 path:map07221 Nicotinic cholinergic receptor antagonists
509 path:map07222 Peroxisome proliferator-activated receptor (PP...
510 path:map07223 Retinoic acid receptor (RAR) and retinoid X re...
511 path:map07224 Opioid receptor agonists/antagonists
512 path:map07225 Glucocorticoid and mineralocorticoid receptor ...
513 path:map07226 Progesterone, androgen and estrogen receptor a...
514 path:map07227 Histamine H2/H3 receptor agonists/antagonists
515 path:map07228 Eicosanoid receptor agonists/antagonists
516 path:map07229 Angiotensin receptor and endothelin receptor a...
517 path:map07230 GABA-A receptor agonists/antagonists
518 path:map07231 Sodium channel blocking drugs
519 path:map07232 Potassium channel blocking and opening drugs
520 path:map07233 Ion transporter inhibitors
521 path:map07234 Neurotransmitter transporter inhibitors
522 path:map07235 N-Metyl-D-aspartic acid receptor antagonists

523 rows × 2 columns

and to restrict the results only to those pathways that are present in K. setae, you can filter the database results with a query string ksk, as the second argument:

In [7]:
# Get all entries in the PATHWAY database for K. setae as a dataframe
result = REST.kegg_list("pathway", "ksk").read()
to_df(result)
Out[7]:
0 1
0 path:ksk00010 Glycolysis / Gluconeogenesis - Kitasatospora s...
1 path:ksk00020 Citrate cycle (TCA cycle) - Kitasatospora setae
2 path:ksk00030 Pentose phosphate pathway - Kitasatospora setae
3 path:ksk00040 Pentose and glucuronate interconversions - Kit...
4 path:ksk00051 Fructose and mannose metabolism - Kitasatospor...
5 path:ksk00052 Galactose metabolism - Kitasatospora setae
6 path:ksk00053 Ascorbate and aldarate metabolism - Kitasatosp...
7 path:ksk00061 Fatty acid biosynthesis - Kitasatospora setae
8 path:ksk00071 Fatty acid degradation - Kitasatospora setae
9 path:ksk00072 Synthesis and degradation of ketone bodies - K...
10 path:ksk00121 Secondary bile acid biosynthesis - Kitasatospo...
11 path:ksk00130 Ubiquinone and other terpenoid-quinone biosynt...
12 path:ksk00190 Oxidative phosphorylation - Kitasatospora setae
13 path:ksk00220 Arginine biosynthesis - Kitasatospora setae
14 path:ksk00230 Purine metabolism - Kitasatospora setae
15 path:ksk00240 Pyrimidine metabolism - Kitasatospora setae
16 path:ksk00250 Alanine, aspartate and glutamate metabolism - ...
17 path:ksk00253 Tetracycline biosynthesis - Kitasatospora setae
18 path:ksk00260 Glycine, serine and threonine metabolism - Kit...
19 path:ksk00261 Monobactam biosynthesis - Kitasatospora setae
20 path:ksk00270 Cysteine and methionine metabolism - Kitasatos...
21 path:ksk00280 Valine, leucine and isoleucine degradation - K...
22 path:ksk00281 Geraniol degradation - Kitasatospora setae
23 path:ksk00290 Valine, leucine and isoleucine biosynthesis - ...
24 path:ksk00300 Lysine biosynthesis - Kitasatospora setae
25 path:ksk00310 Lysine degradation - Kitasatospora setae
26 path:ksk00311 Penicillin and cephalosporin biosynthesis - Ki...
27 path:ksk00330 Arginine and proline metabolism - Kitasatospor...
28 path:ksk00332 Carbapenem biosynthesis - Kitasatospora setae
29 path:ksk00340 Histidine metabolism - Kitasatospora setae
... ... ...
100 path:ksk01057 Biosynthesis of type II polyketide products - ...
101 path:ksk01059 Biosynthesis of enediyne antibiotics - Kitasat...
102 path:ksk01100 Metabolic pathways - Kitasatospora setae
103 path:ksk01110 Biosynthesis of secondary metabolites - Kitasa...
104 path:ksk01120 Microbial metabolism in diverse environments -...
105 path:ksk01130 Biosynthesis of antibiotics - Kitasatospora setae
106 path:ksk01200 Carbon metabolism - Kitasatospora setae
107 path:ksk01210 2-Oxocarboxylic acid metabolism - Kitasatospor...
108 path:ksk01212 Fatty acid metabolism - Kitasatospora setae
109 path:ksk01220 Degradation of aromatic compounds - Kitasatosp...
110 path:ksk01230 Biosynthesis of amino acids - Kitasatospora setae
111 path:ksk01501 beta-Lactam resistance - Kitasatospora setae
112 path:ksk01502 Vancomycin resistance - Kitasatospora setae
113 path:ksk01503 Cationic antimicrobial peptide (CAMP) resistan...
114 path:ksk02010 ABC transporters - Kitasatospora setae
115 path:ksk02020 Two-component system - Kitasatospora setae
116 path:ksk02024 Quorum sensing - Kitasatospora setae
117 path:ksk02060 Phosphotransferase system (PTS) - Kitasatospor...
118 path:ksk03010 Ribosome - Kitasatospora setae
119 path:ksk03018 RNA degradation - Kitasatospora setae
120 path:ksk03020 RNA polymerase - Kitasatospora setae
121 path:ksk03030 DNA replication - Kitasatospora setae
122 path:ksk03050 Proteasome - Kitasatospora setae
123 path:ksk03060 Protein export - Kitasatospora setae
124 path:ksk03070 Bacterial secretion system - Kitasatospora setae
125 path:ksk03410 Base excision repair - Kitasatospora setae
126 path:ksk03420 Nucleotide excision repair - Kitasatospora setae
127 path:ksk03430 Mismatch repair - Kitasatospora setae
128 path:ksk03440 Homologous recombination - Kitasatospora setae
129 path:ksk04122 Sulfur relay system - Kitasatospora setae

130 rows × 2 columns

QUESTIONS
  1. How many entries are in the complete PATHWAY database
  2. How many entries in the PATHWAY database are also present in K. setae
  3. Are these the same answers you got in `lesson 08`?

If, instead of specifying one of the top-level KEGG databases, you specify an organism code, KEGG will return a list of gene entries for that organism:

In [8]:
# Get all genes from K. setae as a dataframe
result = REST.kegg_list("ksk").read()
to_df(result)
Out[8]:
0 1
0 ksk:KSE_00010t ttrA1; putative helicase
1 ksk:KSE_00020t hypothetical protein
2 ksk:KSE_00030t hypothetical protein
3 ksk:KSE_00040t hypothetical protein
4 ksk:KSE_00060t putative helicase
5 ksk:KSE_00070t hypothetical protein
6 ksk:KSE_00080t hypothetical protein
7 ksk:KSE_00090t hypothetical protein
8 ksk:KSE_00100t hypothetical protein
9 ksk:KSE_00110t hypothetical protein
10 ksk:KSE_00120t putative transposase
11 ksk:KSE_00130t hypothetical protein
12 ksk:KSE_00140t hypothetical protein
13 ksk:KSE_00150t hypothetical protein
14 ksk:KSE_00160t hypothetical protein
15 ksk:KSE_00170t hypothetical protein
16 ksk:KSE_00180t hypothetical protein
17 ksk:KSE_00190t hypothetical protein
18 ksk:KSE_00200t putative sesquiterpene cyclase
19 ksk:KSE_00210t hypothetical protein
20 ksk:KSE_00220t hypothetical protein
21 ksk:KSE_00230t hypothetical protein
22 ksk:KSE_00240t hypothetical protein
23 ksk:KSE_00250t hypothetical protein
24 ksk:KSE_00260t hypothetical protein
25 ksk:KSE_00270t hypothetical protein
26 ksk:KSE_00280t hypothetical protein
27 ksk:KSE_00290t hypothetical protein
28 ksk:KSE_00300t putative transposase
29 ksk:KSE_00310t hypothetical protein
... ... ...
7643 ksk:KSE_76430t hypothetical protein
7644 ksk:KSE_76440t putative transposase
7645 ksk:KSE_76450t hypothetical protein
7646 ksk:KSE_76460t hypothetical protein
7647 ksk:KSE_76470t hypothetical protein
7648 ksk:KSE_76480t hypothetical protein
7649 ksk:KSE_76490t hypothetical protein
7650 ksk:KSE_76500t hypothetical protein
7651 ksk:KSE_76510t hypothetical protein
7652 ksk:KSE_76520t hypothetical protein
7653 ksk:KSE_76530t hypothetical protein
7654 ksk:KSE_76540t putative sesquiterpene cyclase
7655 ksk:KSE_76550t hypothetical protein
7656 ksk:KSE_76560t hypothetical protein
7657 ksk:KSE_76570t hypothetical protein
7658 ksk:KSE_76580t hypothetical protein
7659 ksk:KSE_76590t hypothetical protein
7660 ksk:KSE_76600t hypothetical protein
7661 ksk:KSE_76610t hypothetical protein
7662 ksk:KSE_76620t putative transposase
7663 ksk:KSE_76630t hypothetical protein
7664 ksk:KSE_76640t hypothetical protein
7665 ksk:KSE_76650t hypothetical protein
7666 ksk:KSE_76660t hypothetical protein
7667 ksk:KSE_76670t hypothetical protein
7668 ksk:KSE_76680t putative helicase
7669 ksk:KSE_76700t hypothetical protein
7670 ksk:KSE_76710t hypothetical protein
7671 ksk:KSE_76720t hypothetical protein
7672 ksk:KSE_76730t ttrA2; putative helicase

7673 rows × 2 columns

kegg_find()

The `kegg_find()` function will search a named `KEGG` database with a specified query term.

For instance, to query the GENES database with the entry accession KSE_17560 you could use:

In [9]:
# Find a specific entry with a precise search term
result = REST.kegg_find("genes", "KSE_17560").read()
to_df(result)
Out[9]:
0 1
0 ksk:KSE_17560 dxs1; putative 1-deoxy-D-xylulose-5-phosphate ...

With the query above, KEGG returns information for the exact entry we've requested. But we can also use less precise search terms, and combine them with the + symbol. For example, to search for shiga toxin we would use the query:

"shiga+toxin"
In [10]:
# Find all shiga toxin genes
result = REST.kegg_find("genes", "shiga+toxin").read()
to_df(result)
Out[10]:
0 1
0 ece:Z1464 stx2A; shiga-like toxin II A subunit encoded b...
1 ece:Z1465 stx2B; shiga-like toxin II B subunit encoded b...
2 ece:Z3343 stx1B; shiga-like toxin 1 subunit B encoded wi...
3 ece:Z3344 stx1A; shiga-like toxin 1 subunit A encoded wi...
4 ecs:ECs1205 Shiga toxin 2 subunit A
5 ecs:ECs1206 Shiga toxin 2 subunit B
6 ecs:ECs2973 Shiga toxin I subunit B
7 ecs:ECs2974 Shiga toxin I subunit A
8 ecf:ECH74115_2905 shigatoxin 2, subunit B
9 ecf:ECH74115_2906 shiga toxin subunit A
10 ecf:ECH74115_3532 shiga toxin 2 B subunit
11 ecf:ECH74115_3533 shiga toxin subunit A
12 etw:ECSP_2722 stx2cB; Shiga-like toxin II subunit B precursor
13 etw:ECSP_2723 stx2A1; Shiga-like toxin II subunit A precursor
14 etw:ECSP_3252 stx2B; shiga toxin II subunit B
15 etw:ECSP_3253 stx2A2; shiga toxin II subunit A
16 elx:CDCO157_1154 Shiga toxin 2 subunit A
17 elx:CDCO157_1155 Shiga toxin 2 subunit B
18 elx:CDCO157_2738 Shiga toxin I subunit B precursor
19 elx:CDCO157_2739 Shiga toxin I subunit A precursor
20 eoj:ECO26_1599 Shiga toxin 1 subunit A
21 eoj:ECO26_1600 Shiga toxin 1 subunit B
22 eoi:ECO111_2429 Shiga toxin 2 subunit B
23 eoi:ECO111_2430 Shiga toxin 2 subunit A
24 eoi:ECO111_3361 Shiga toxin 1 subunit A
25 eoi:ECO111_3362 Shiga toxin 1 subunit B
26 eoh:ECO103_2844 Shiga toxin 2 subunit B
27 eoh:ECO103_2845 Shiga toxin 2 subunit A
28 eoh:ECO103_5197 Shiga toxin 1 subunit A
29 eoh:ECO103_5198 Shiga toxin 1 subunit B
... ... ...
78 vg:26516283 stxB, AU083_gp11; Escherichia phage phi191; sh...
79 vg:26516284 stxA, AU083_gp12; Escherichia phage phi191; sh...
80 vg:26519429 AU154_gp39; Shigella phage Ss-VASD; Stx1 A sub...
81 vg:26519430 AU154_gp40; Shigella phage Ss-VASD; Stx1 B sub...
82 vg:1481767 stx2A, Stx2II_p143; Escherichia phage Stx2 II;...
83 vg:1481768 stx2B, Stx2II_p144; Escherichia phage Stx2 II;...
84 vg:26798065 AXI88_gp33; Shigella phage 75/02 Stx; shiga to...
85 vg:26798066 AXI88_gp34; Shigella phage 75/02 Stx; shiga to...
86 vg:1481747 stx1A, Stx1_p142; Escherichia Stx1 converting ...
87 vg:1481748 stx1B, Stx1_p143; Escherichia Stx1 converting ...
88 vg:1261950 stxA2, 933Wp40; Enterobacteria phage 933W; Shi...
89 vg:1262010 stxB2, 933Wp41; Enterobacteria phage 933W; Shi...
90 vg:2641645 stxA1, PBV4795_ORF40; Enterobacteria phage BP-...
91 vg:2641657 stxB1, PBV4795_ORF41; Enterobacteria phage BP-...
92 vg:929695 stxA2e, P27p25; Enterobacteria phage phiP27; s...
93 vg:929727 stxB2e, P27p26; Enterobacteria phage phiP27; s...
94 vg:4397483 stx2A, Stx2-86_gp01; Stx2-converting phage 86;...
95 vg:4397484 stx2B, Stx2-86_gp02; Stx2-converting phage 86;...
96 vg:6159405 stx2A, pMIN27_41; Escherichia phage Min27; Shi...
97 vg:6159351 stx2B, pMIN27_42; Escherichia phage Min27; Shi...
98 vg:6973138 Stx2-1717_gp41; Stx2-converting phage 1717; ve...
99 vg:6972909 stx2cB, Stx2-1717_gp42; Stx2-converting phage ...
100 vg:6973079 YYZ_gp39; Enterobacteria phage YYZ-2008; Shiga...
101 vg:6973080 YYZ_gp40; Enterobacteria phage YYZ-2008; Shiga...
102 vg:13828571 stx2A, D300_gp43; Escherichia phage P13374; sh...
103 vg:13828535 stx2B, D300_gp42; Escherichia phage P13374; sh...
104 vg:14005228 F366_gp36; Escherichia phage TL-2011c; Shiga t...
105 vg:14005229 F366_gp37; Escherichia phage TL-2011c; Shiga t...
106 vg:1262249 stx2A, VT2-Sap42; Enterobacteria phage VT2-Sak...
107 vg:1262250 stx2B, VT2-Sap43; Enterobacteria phage VT2-Sak...

108 rows × 2 columns

We can restrict this search to specific organisms, such as Escherichia coli O111 H-11128 (EHEC), by supplying its three letter code (here, eoi) as the database to be searched:

In [11]:
# Find all shiga toxin genes in eoi
result = REST.kegg_find("eoi", "shiga+toxin").read()
to_df(result)
Out[11]:
0 1
0 eoi:ECO111_2429 Shiga toxin 2 subunit B
1 eoi:ECO111_2430 Shiga toxin 2 subunit A
2 eoi:ECO111_3361 Shiga toxin 1 subunit A
3 eoi:ECO111_3362 Shiga toxin 1 subunit B

The kegg_find() query string can also search in specific fields of the entry. The format for this is:

"<query_value>/<field>"

So, to search for all compounds with a molecular weight between 300 and 310 mass units, you can use the code:

In [12]:
# Find all compounds with mass between 300 and 310 units
result = REST.kegg_find("compound", "300-310/mol_weight").read()
to_df(result)
Out[12]:
0 1
0 cpd:C00051 307.323480
1 cpd:C00200 306.336960
2 cpd:C00219 304.466880
3 cpd:C00239 307.197122
4 cpd:C00270 309.269860
5 cpd:C00357 301.187702
6 cpd:C00365 308.181882
7 cpd:C00389 302.235700
8 cpd:C00732 308.372760
9 cpd:C00777 300.435120
10 cpd:C00836 301.507760
11 cpd:C00891 302.494060
12 cpd:C00892 304.509940
13 cpd:C00941 305.181242
14 cpd:C01143 308.116884
15 cpd:C01169 307.429440
16 cpd:C01294 301.191002
17 cpd:C01416 303.352940
18 cpd:C01513 306.320420
19 cpd:C01541 308.327940
20 cpd:C01564 305.368820
21 cpd:C01617 304.251580
22 cpd:C01632 306.313820
23 cpd:C01670 306.360180
24 cpd:C01682 304.299700
25 cpd:C01709 302.278760
26 cpd:C01804 304.347600
27 cpd:C01851 303.352940
28 cpd:C02197 300.392060
29 cpd:C02354 305.181242
... ... ...
472 cpd:C20121 302.346242
473 cpd:C20129 308.327940
474 cpd:C20149 302.451000
475 cpd:C20201 302.364880
476 cpd:C20203 306.396640
477 cpd:C20208 302.451000
478 cpd:C20329 304.466880
479 cpd:C20389 308.341160
480 cpd:C20423 308.116884
481 cpd:C20428 300.435120
482 cpd:C20429 300.435120
483 cpd:C20431 300.349000
484 cpd:C20559 305.221002
485 cpd:C20693 302.407940
486 cpd:C20726 306.205762
487 cpd:C20848 308.116884
488 cpd:C20939 301.709520
489 cpd:C20962 300.311000
490 cpd:C20978 308.328160
491 cpd:C21053 309.269860
492 cpd:C21107 306.205762
493 cpd:C21255 307.343400
494 cpd:C21256 305.327520
495 cpd:C21257 305.327520
496 cpd:C21258 304.342760
497 cpd:C21259 306.358640
498 cpd:C21296 304.423820
499 cpd:C21323 306.310520
500 cpd:C21561 304.466880
501 cpd:C21562 304.466880

502 rows × 2 columns

kegg_get()

Most functions you've seen so far will return two columns of data: the first column being the entry accession, and the second column being a description of that entry, or the requested value.

The `kegg_get()` function lets us retrieve specific entries from `KEGG` - such as our search results - in named formats.

For example, the first compound in our search for molecular weights in the range 300-310 above has entry accession cpd:C00051. We can recover this entry as follows:

In [13]:
# Get the entry information for cpd:C00051
result = REST.kegg_get("cpd:C00051").read()
print(result)
ENTRY       C00051                      Compound
NAME        Glutathione;
            5-L-Glutamyl-L-cysteinylglycine;
            N-(N-gamma-L-Glutamyl-L-cysteinyl)glycine;
            gamma-L-Glutamyl-L-cysteinyl-glycine;
            GSH;
            Reduced glutathione
FORMULA     C10H17N3O6S
EXACT_MASS  307.0838
MOL_WEIGHT  307.3235
REMARK      Same as: D00014
REACTION    R00094 R00115 R00120 R00274 R00494 R00497 R00499 R00527 
            R00547 R00900 R01108 R01109 R01110 R01111 R01113 R01262 
            R01292 R01736 R01875 R01917 R01918 R02530 R02824 R03059 
            R03082 R03167 R03522 R03822 R03915 R03956 R03984 R04039 
            R04090 R04860 R05267 R05269 R05402 R05403 R05714 R05717 
            R05748 R06982 R07002 R07003 R07004 R07023 R07024 R07025 
            R07026 R07034 R07035 R07069 R07070 R07083 R07084 R07091 
            R07092 R07093 R07094 R07100 R07113 R07116 R07124 R08280 
            R08350 R08351 R08352 R08353 R08354 R08355 R08511 R08512 
            R08678 R09338 R09367 R09368 R09409 R11411 R11650 R11652 
            R11659 R11734 R11736 R11737 R11739 R11861 R11905 R11929 
            R11947
PATHWAY     map00270  Cysteine and methionine metabolism
            map00480  Glutathione metabolism
            map01100  Metabolic pathways
            map02010  ABC transporters
            map04216  Ferroptosis
            map04918  Thyroid hormone synthesis
            map04976  Bile secretion
MODULE      M00118  Glutathione biosynthesis, glutamate => glutathione
ENZYME      1.5.4.1         1.8.1.7         1.8.1.9         1.8.1.10        
            1.8.3.3         1.8.4.1         1.8.4.2         1.8.4.3         
            1.8.4.4         1.8.4.7         1.8.4.9         1.8.5.1         
            1.8.5.7         1.8.5.8         1.11.1.9        1.11.1.12       
            1.13.11.18      1.14.14.43      1.14.14.45      1.20.4.2        
            2.3.2.2         2.3.2.15        2.5.1.18        2.5.1.-         
            2.8.1.3         3.1.2.6         3.1.2.7         3.1.2.12        
            3.1.2.13        3.4.19.13       3.5.1.78        3.5.1.-         
            4.3.2.7         4.4.1.5         4.4.1.20        4.4.1.22        
            4.4.1.34        6.3.1.8         6.3.1.9         6.3.2.3
BRITE       Compounds with biological roles [BR:br08001]
             Vitamins and Cofactors
              Cofactors
               Coenzymes
                C00051  Glutathione
            Anatomical Therapeutic Chemical (ATC) classification [BR:br08303]
             V VARIOUS
              V03 ALL OTHER THERAPEUTIC PRODUCTS
               V03A ALL OTHER THERAPEUTIC PRODUCTS
                V03AB Antidotes
                 V03AB32 Glutathione
                  D00014  Glutathione (JP17)
            Therapeutic category of drugs in Japan [BR:br08301]
             1  Agents affecting nervous system and sensory organs
              13  Agents affecting sensory organs
               131  Ophthalmic agents
                1319  Others
                 D00014  Glutathione (JP17)
             3  Agents affecting metabolism
              39  Other agents affecting metabolism
               392  Antidotes
                3922  Glutathiones
                 D00014  Glutathione (JP17)
            Drugs listed in the Japanese Pharmacopoeia [BR:br08311]
             Chemicals
              D00014  Glutathione
            Drug classes of therapeutic agents [br08360.html]
             Endocrine and hormonal agents
              D00014
            Animal drugs in Japan [BR:br08331]
             96  Agents affecting metabolism
              967  Agents for liver disease and antidotes
               9676  Other amino acid and preparations
                C00051  Glutathione
DBLINKS     CAS: 70-18-8
            PubChem: 3353
            ChEBI: 16856
            ChEMBL: CHEMBL1514919 CHEMBL1543
            KNApSAcK: C00001518
            PDB-CCD: GSH
            3DMET: B01138
            NIKKAJI: J10.686K
ATOM        20
            1   O6a O    24.0100  -16.3100
            2   C6a C    25.2000  -15.6100
            3   C1c C    26.4600  -16.3100
            4   C1b C    27.6500  -15.6100
            5   C1b C    28.8400  -16.3100
            6   C5a C    30.1000  -15.6100
            7   N1b N    31.2900  -16.3100
            8   C1c C    32.4800  -15.6100
            9   C5a C    33.7400  -16.3100
            10  N1b N    34.9300  -15.6100
            11  C1b C    36.1200  -16.3100
            12  C6a C    37.3800  -15.6100
            13  O6a O    38.5700  -16.3100
            14  O6a O    25.2000  -14.2100
            15  N1a N    26.4600  -17.7100
            16  O5a O    30.1000  -14.2100
            17  C1b C    32.4800  -14.2100
            18  S1a S    33.6700  -13.5100
            19  O5a O    33.7400  -17.7100
            20  O6a O    37.3800  -14.2100
BOND        19
            1     1   2 1
            2     2   3 1
            3     3   4 1
            4     4   5 1
            5     5   6 1
            6     6   7 1
            7     7   8 1
            8     8   9 1
            9     9  10 1
            10   10  11 1
            11   11  12 1
            12   12  13 1
            13    2  14 2
            14    3  15 1 #Down
            15    6  16 2
            16    8  17 1 #Up
            17   17  18 1
            18    9  19 2
            19   12  20 2
///

QUESTIONS
  1. What information is returned in the default result?

KEGG provides a number of different entry types, which cannot all be recovered in exactly the same ways. For instance, the COMPOUND entries typically have an associated molecular structure image, which can be recovered with kegg_get() by specifying the format to be "image":

In [14]:
# Display molecular structure for cpd:C00051
result = REST.kegg_get("cpd:C00051", "image").read()
Image(result)
Out[14]:

GENE entries are sequences, so can be recovered as their database entries (default), or as FASTA format nucleotide and/or protein sequences:

In [15]:
# Get entry information for KSE_17560
result = REST.kegg_get("ksk:KSE_17560").read()
print(result)
ENTRY       KSE_17560         CDS       T01648
NAME        dxs1
DEFINITION  (GenBank) putative 1-deoxy-D-xylulose-5-phosphate synthase
ORTHOLOGY   K01662  1-deoxy-D-xylulose-5-phosphate synthase [EC:2.2.1.7]
ORGANISM    ksk  Kitasatospora setae
PATHWAY     ksk00730  Thiamine metabolism
            ksk00900  Terpenoid backbone biosynthesis
            ksk01100  Metabolic pathways
            ksk01110  Biosynthesis of secondary metabolites
            ksk01130  Biosynthesis of antibiotics
MODULE      ksk_M00096  C5 isoprenoid biosynthesis, non-mevalonate pathway
BRITE       KEGG Orthology (KO) [BR:ksk00001]
             Metabolism
              Metabolism of cofactors and vitamins
               00730 Thiamine metabolism
                KSE_17560 (dxs1)
              Metabolism of terpenoids and polyketides
               00900 Terpenoid backbone biosynthesis
                KSE_17560 (dxs1)
            Enzymes [BR:ksk01000]
             2. Transferases
              2.2  Transferring aldehyde or ketonic groups
               2.2.1  Transketolases and transaldolases
                2.2.1.7  1-deoxy-D-xylulose-5-phosphate synthase
                 KSE_17560 (dxs1)
POSITION    complement(1952373..1954298)
MOTIF       Pfam: DXP_synthase_N Transket_pyr Transketolase_C TPP_enzyme_C E1_dh Transketolase_N DUF4054 PFOR_II
DBLINKS     NCBI-ProteinID: BAJ27580
            NITE: KSE_17560
            UniProt: E4N8P9
AASEQ       641
            MPLLSQITGPADLRRLHPEQLPLLADEIRDFLIDAVTRTGGHLGPNLGVVELSIALHRVF
            DSPRDRVLWDTGHQAYVHKLLTGRQDFSRLRAKDGLSGYPSRAESEHDLIENSHASTALG
            YADGIAKANQLLGADRHTVAVIGDGALTGGMAWEALNNIAEAEDRPLVIVVNDNERSYAP
            TIGGLAHHLATLRTTRGYERFLAWGKDALQRTPVVGPPLFDALHGAKKGFKDAFAPQGMF
            EDLGLKYLGPIDGHDIAAVEQALRQARNFGGPVIVHCLTVKGRGYRPAEQDEADRFHAVG
            PIDPYTCLPISPSAGASWTSVFSQEMLALGAERPDLVAVTAAMLHPVGLGPFAAAHPGRT
            YDVGIAEQHAVASAAGLATGGLHPVVAVYATFLNRAFDQVLMDVALHKLGVTFVLDRAGV
            TGNDGASHNGMWDMSILQVVPGLRLAAPRDADRLREQLREAVAVEDAPTVVRFPKGDLGP
            EIPAVERIGGVDVLARTGPSPDVLLVAVGSMAPACLDAAALLAAEGITATVVDPRWVKPV
            DPALVALAAAHRMVVTVEDNGRAGGVGAAVAQAMRDAEVDTPLRDLGVPQEFLAHASRGE
            ILEEIGLTGTGVAAQTAAYARRLLPGTRSGAQEYRPRVPRK
NTSEQ       1926
            atgccactgctgagccagatcaccgggcccgccgacctcagacgactgcaccccgagcag
            ctgccgctgctcgccgacgagatccgcgacttcctgatcgacgccgtcacccgcaccggc
            ggccacctcggccccaacctcggcgtggtcgagctcagcatcgccctacaccgggtcttc
            gactccccgcgcgaccgcgtcctgtgggacaccggccaccaggcctacgtgcacaagctg
            ctcaccggccggcaggacttcagccggctgcgcgccaaggacggcctctccggctacccc
            tcgcgcgccgagtccgaacacgacctgatcgagaactcgcacgcctccaccgcgctcggc
            tacgccgacggcatcgccaaggccaaccaactgctcggcgccgaccggcacaccgtcgcc
            gtgatcggcgacggcgcgctcaccggcggcatggcctgggaggcgctcaacaacatcgcc
            gaggccgaggaccgcccgctggtcatcgtcgtcaacgacaacgagcgctcctacgcgccc
            accatcggcggcctcgcccaccacctcgccaccctgcgcaccacccgcggctacgagcgc
            ttcctcgcctggggcaaggacgccctgcagcgcacccccgtggtcgggccgccgctgttc
            gacgcgctgcacggcgccaagaagggcttcaaggacgccttcgccccgcagggcatgttc
            gaggacctcggtctgaagtacctcggcccgatcgacggccacgacatcgccgccgtcgaa
            caggcgctgcgccaggcccggaacttcggcgggcccgtcatcgtgcactgcctgaccgtc
            aagggccgcggctaccggcccgccgagcaggacgaggccgaccgcttccacgccgtcggc
            ccgatcgacccgtacacctgcctgccgatctcgccgtccgccggggcctcctggacttcg
            gtgttcagccaggagatgctcgccctcggcgccgagcggcccgacctggtcgccgtcacc
            gccgcgatgctgcaccccgtcgggctcggcccgttcgccgccgcgcaccccgggcggacc
            tacgacgtcgggatcgccgagcagcacgccgtcgcctccgccgccggcctggccaccggg
            gggctgcaccccgtcgtcgcggtgtacgcgaccttcctgaaccgggccttcgaccaggtg
            ctgatggacgtcgcgctgcacaagctgggcgtcaccttcgtgctcgaccgggccggggtc
            accggcaacgacggggcctcgcacaacggcatgtgggacatgtcgatcctgcaggtcgtg
            cccgggctgcggctggccgcgccgcgcgacgccgaccggctgcgcgaacagctccgggag
            gccgtcgcggtcgaggacgcgcccaccgtggtgcgcttccccaagggcgacctcggcccc
            gagatcccggcggtcgagcggatcggcggcgtcgacgtgctggcccgcaccggccccagc
            cccgacgtgctgctggtcgccgtcggctcgatggcccccgcctgcctggacgccgccgcg
            ctgctcgccgccgagggcatcaccgccaccgtcgtcgacccgcgctgggtcaagcccgtc
            gaccccgccctcgtcgcgctggccgccgcgcaccggatggtggtcaccgtcgaggacaac
            gggcgggccggcggcgtcggcgccgccgtcgcccaggcgatgcgggacgccgaggtcgac
            accccgctgcgcgacctcggcgtcccgcaggagttcctggcgcacgcctcgcgcggtgag
            atcctggaggagatcggactcaccggcaccggcgtcgccgcccagaccgccgcctacgcc
            cgccgcctgctgcccggcacccggagcggcgcccaggagtaccggccccgggtgccgcgc
            aagtag
///

In [16]:
# Get coding sequence for KSE_17560
result = REST.kegg_get("ksk:KSE_17560", "ntseq").read()
print(result)
>ksk:KSE_17560 K01662 1-deoxy-D-xylulose-5-phosphate synthase [EC:2.2.1.7] | (GenBank) dxs1; putative 1-deoxy-D-xylulose-5-phosphate synthase (N)
atgccactgctgagccagatcaccgggcccgccgacctcagacgactgcaccccgagcag
ctgccgctgctcgccgacgagatccgcgacttcctgatcgacgccgtcacccgcaccggc
ggccacctcggccccaacctcggcgtggtcgagctcagcatcgccctacaccgggtcttc
gactccccgcgcgaccgcgtcctgtgggacaccggccaccaggcctacgtgcacaagctg
ctcaccggccggcaggacttcagccggctgcgcgccaaggacggcctctccggctacccc
tcgcgcgccgagtccgaacacgacctgatcgagaactcgcacgcctccaccgcgctcggc
tacgccgacggcatcgccaaggccaaccaactgctcggcgccgaccggcacaccgtcgcc
gtgatcggcgacggcgcgctcaccggcggcatggcctgggaggcgctcaacaacatcgcc
gaggccgaggaccgcccgctggtcatcgtcgtcaacgacaacgagcgctcctacgcgccc
accatcggcggcctcgcccaccacctcgccaccctgcgcaccacccgcggctacgagcgc
ttcctcgcctggggcaaggacgccctgcagcgcacccccgtggtcgggccgccgctgttc
gacgcgctgcacggcgccaagaagggcttcaaggacgccttcgccccgcagggcatgttc
gaggacctcggtctgaagtacctcggcccgatcgacggccacgacatcgccgccgtcgaa
caggcgctgcgccaggcccggaacttcggcgggcccgtcatcgtgcactgcctgaccgtc
aagggccgcggctaccggcccgccgagcaggacgaggccgaccgcttccacgccgtcggc
ccgatcgacccgtacacctgcctgccgatctcgccgtccgccggggcctcctggacttcg
gtgttcagccaggagatgctcgccctcggcgccgagcggcccgacctggtcgccgtcacc
gccgcgatgctgcaccccgtcgggctcggcccgttcgccgccgcgcaccccgggcggacc
tacgacgtcgggatcgccgagcagcacgccgtcgcctccgccgccggcctggccaccggg
gggctgcaccccgtcgtcgcggtgtacgcgaccttcctgaaccgggccttcgaccaggtg
ctgatggacgtcgcgctgcacaagctgggcgtcaccttcgtgctcgaccgggccggggtc
accggcaacgacggggcctcgcacaacggcatgtgggacatgtcgatcctgcaggtcgtg
cccgggctgcggctggccgcgccgcgcgacgccgaccggctgcgcgaacagctccgggag
gccgtcgcggtcgaggacgcgcccaccgtggtgcgcttccccaagggcgacctcggcccc
gagatcccggcggtcgagcggatcggcggcgtcgacgtgctggcccgcaccggccccagc
cccgacgtgctgctggtcgccgtcggctcgatggcccccgcctgcctggacgccgccgcg
ctgctcgccgccgagggcatcaccgccaccgtcgtcgacccgcgctgggtcaagcccgtc
gaccccgccctcgtcgcgctggccgccgcgcaccggatggtggtcaccgtcgaggacaac
gggcgggccggcggcgtcggcgccgccgtcgcccaggcgatgcgggacgccgaggtcgac
accccgctgcgcgacctcggcgtcccgcaggagttcctggcgcacgcctcgcgcggtgag
atcctggaggagatcggactcaccggcaccggcgtcgccgcccagaccgccgcctacgcc
cgccgcctgctgcccggcacccggagcggcgcccaggagtaccggccccgggtgccgcgc
aagtag

In [17]:
# Get protein sequence for KSE_17560
result = REST.kegg_get("ksk:KSE_17560", "aaseq").read()
print(result)
>ksk:KSE_17560 K01662 1-deoxy-D-xylulose-5-phosphate synthase [EC:2.2.1.7] | (GenBank) dxs1; putative 1-deoxy-D-xylulose-5-phosphate synthase (A)
MPLLSQITGPADLRRLHPEQLPLLADEIRDFLIDAVTRTGGHLGPNLGVVELSIALHRVF
DSPRDRVLWDTGHQAYVHKLLTGRQDFSRLRAKDGLSGYPSRAESEHDLIENSHASTALG
YADGIAKANQLLGADRHTVAVIGDGALTGGMAWEALNNIAEAEDRPLVIVVNDNERSYAP
TIGGLAHHLATLRTTRGYERFLAWGKDALQRTPVVGPPLFDALHGAKKGFKDAFAPQGMF
EDLGLKYLGPIDGHDIAAVEQALRQARNFGGPVIVHCLTVKGRGYRPAEQDEADRFHAVG
PIDPYTCLPISPSAGASWTSVFSQEMLALGAERPDLVAVTAAMLHPVGLGPFAAAHPGRT
YDVGIAEQHAVASAAGLATGGLHPVVAVYATFLNRAFDQVLMDVALHKLGVTFVLDRAGV
TGNDGASHNGMWDMSILQVVPGLRLAAPRDADRLREQLREAVAVEDAPTVVRFPKGDLGP
EIPAVERIGGVDVLARTGPSPDVLLVAVGSMAPACLDAAALLAAEGITATVVDPRWVKPV
DPALVALAAAHRMVVTVEDNGRAGGVGAAVAQAMRDAEVDTPLRDLGVPQEFLAHASRGE
ILEEIGLTGTGVAAQTAAYARRLLPGTRSGAQEYRPRVPRK

Retrieving pathways

`KEGG` is practically synonymous with its excellent pathway diagrams, and it should be no surprise that you can retrive these using Python, too. You can get these images directly with `kegg_get()`, using the `"image"` format.

To specify one of the generic pathway maps, you can combine the map prefix with the pathway number to make the query mapNNNNN as in the cells, below.

In [18]:
# Get map of fatty-acid biosynthesis
result = REST.kegg_get("map00061", "image").read()
Image(result)
Out[18]:
In [19]:
# Get map of central metabolism
result = REST.kegg_get("map01100", "image").read()
Image(result)
Out[19]:

If you want to retrieve the pathway map corresponding to a particular organism, then you can replace the prefix map with the three-letter code for that organism, as in the examples below for Kitasatospora where map is replaced with ksk:

In [20]:
# Get map of fatty-acid biosynthesis in Kitasatospora
result = REST.kegg_get("ksk00061", "image").read()
Image(result)
Out[20]:
In [21]:
# Get map of central metabolism in Kitasatospora
result = REST.kegg_get("ksk01100", "image").read()
Image(result)
Out[21]:

KEGG provides copious information about pathways in the accompanying database entries, which can be obtained by not providing a download format:

In [22]:
# Get data for fatty-acid biosynthesis in Kitasatospora
result = REST.kegg_get("ksk00061").read()
print(result)
ENTRY       ksk00061                    Pathway
NAME        Fatty acid biosynthesis - Kitasatospora setae
CLASS       Metabolism; Lipid metabolism
PATHWAY_MAP ksk00061  Fatty acid biosynthesis
MODULE      ksk_M00082  Fatty acid biosynthesis, initiation [PATH:ksk00061]
            ksk_M00083  Fatty acid biosynthesis, elongation [PATH:ksk00061]
ORGANISM    Kitasatospora setae [GN:ksk]
GENE        KSE_65020  putative acyl-CoA carboxylase [KO:K01962 K01963] [EC:2.1.3.15 6.4.1.2 2.1.3.15 6.4.1.2]
            KSE_72490  putative acyl-CoA carboxylase [KO:K01962 K01963] [EC:2.1.3.15 6.4.1.2 2.1.3.15 6.4.1.2]
            KSE_72500  putative acyl-CoA carboxylase [KO:K02160]
            KSE_72510  putative acyl-CoA carboxylase [KO:K01961] [EC:6.3.4.14 6.4.1.2]
            KSE_26830  accA; putative acetyl-CoA carboxylase biotin carboxylase [KO:K11263] [EC:6.3.4.14 6.4.1.3 6.4.1.2]
            KSE_29850  putative propionyl-CoA carboxylase alpha subunit [KO:K11263] [EC:6.3.4.14 6.4.1.3 6.4.1.2]
            KSE_24970  fabD1; putative malonyl-CoA--acyl carrier protein transacylase [KO:K00645] [EC:2.3.1.39]
            KSE_42300  fabD2; putative malonyl-CoA--acyl carrier protein transacylase [KO:K00645] [EC:2.3.1.39]
            KSE_73570  bfmI; malonyl transferase [KO:K00645] [EC:2.3.1.39]
            KSE_27260  hypothetical protein [KO:K00648] [EC:2.3.1.180]
            KSE_27270  hypothetical protein [KO:K00648] [EC:2.3.1.180]
            KSE_33040  fabH3; putative 3-oxoacyl-[acyl-carrier-protein] synthase III [KO:K00648] [EC:2.3.1.180]
            KSE_65900  hypothetical protein [KO:K00648] [EC:2.3.1.180]
            KSE_24980  fabH1; putative 3-oxoacyl-[acyl-carrier-protein] synthase III [KO:K00648] [EC:2.3.1.180]
            KSE_65240  hypothetical protein [KO:K00648] [EC:2.3.1.180]
            KSE_65510  fabH5; putative 3-oxoacyl-[acyl-carrier-protein] synthase III [KO:K00648] [EC:2.3.1.180]
            KSE_65850  hypothetical protein [KO:K00648] [EC:2.3.1.180]
            KSE_67490  hypothetical protein [KO:K00648] [EC:2.3.1.180]
            KSE_73340  fabH8; putative 3-oxoacyl-[acyl-carrier-protein] synthase III [KO:K00648] [EC:2.3.1.180]
            KSE_71440  fabH6; putative 3-oxoacyl-[acyl-carrier-protein] synthase III [KO:K00648] [EC:2.3.1.180]
            KSE_27340  fabH2; putative 3-oxoacyl-[acyl-carrier-protein] synthase III [KO:K00648] [EC:2.3.1.180]
            KSE_42420  putative 3-oxoacyl-[acyl-carrier-protein] synthase [KO:K09458] [EC:2.3.1.179]
            KSE_15050  fabF1; putative 3-oxoacyl-[acyl-carrier-protein] synthase II [KO:K09458] [EC:2.3.1.179]
            KSE_65980  putative 3-oxoacyl-[acyl-carrier-protein] synthase [KO:K09458] [EC:2.3.1.179]
            KSE_25000  fabF2; putative 3-oxoacyl-[acyl-carrier-protein] synthase II [KO:K09458] [EC:2.3.1.179]
            KSE_67350  fabF4; putative 3-oxoacyl-[acyl-carrier-protein] synthase II [KO:K09458] [EC:2.3.1.179]
            KSE_65990  putative 3-oxoacyl-[acyl-carrier-protein] synthase [KO:K09458] [EC:2.3.1.179]
            KSE_59460  fabF7; putative 3-oxoacyl-[acyl-carrier-protein] synthase II [KO:K09458] [EC:2.3.1.179]
            KSE_65090  fabF3; putative 3-oxoacyl-[acyl-carrier-protein] synthase II [KO:K09458] [EC:2.3.1.179]
            KSE_65120  putative 3-oxoacyl-[acyl-carrier-protein] synthase [KO:K09458] [EC:2.3.1.179]
            KSE_67380  putative 3-oxoacyl-[acyl-carrier-protein] synthase [KO:K09458] [EC:2.3.1.179]
            KSE_73360  fabF5; putative 3-oxoacyl-[acyl-carrier-protein] synthase II [KO:K09458] [EC:2.3.1.179]
            KSE_27320  fabF6; putative 3-oxoacyl-[acyl-carrier-protein] synthase II [KO:K09458] [EC:2.3.1.179]
            KSE_42370  fabG; putative 3-oxoacyl-[acyl-carrier-protein] reductase [KO:K00059] [EC:1.1.1.100]
            KSE_56930  putative oxidoreductase [KO:K00059] [EC:1.1.1.100]
            KSE_57080  putative 3-oxoacyl-[acyl-carrier-protein] reductase [KO:K00059] [EC:1.1.1.100]
            KSE_65960  putative 3-oxoacyl-[acyl-carrier-protein] reductase [KO:K00059] [EC:1.1.1.100]
            KSE_54450  putative oxidoreductase [KO:K00059] [EC:1.1.1.100]
            KSE_64480  putative 3-oxoacyl-[acyl-carrier-protein] reductase [KO:K00059] [EC:1.1.1.100]
            KSE_17510  putative 3-oxoacyl-[acyl-carrier-protein] reductase [KO:K00059] [EC:1.1.1.100]
            KSE_11260  putative oxidoreductase [KO:K00059] [EC:1.1.1.100]
            KSE_46920  putative 3-oxoacyl-[acyl-carrier-protein] reductase [KO:K00059] [EC:1.1.1.100]
            KSE_08640  putative oxidoreductase [KO:K00059] [EC:1.1.1.100]
            KSE_75270  putative oxidoreductase [KO:K00059] [EC:1.1.1.100]
            KSE_42390  fabZ; putative beta-hydroxyacyl-[acyl-carrier-protein] dehydratase [KO:K02372] [EC:4.2.1.59]
            KSE_57090  fabI; putative enoyl-[acyl-carrier-protein] reductase [KO:K00208] [EC:1.3.1.10 1.3.1.9]
            KSE_54350  desA; acyl-[acyl-carrier-protein] desaturase [KO:K03921] [EC:1.14.19.26 1.14.19.11 1.14.19.2]
            KSE_25760  fadD4; putative long-chain fatty-acid--CoA ligase [KO:K01897] [EC:6.2.1.3]
            KSE_21700  fadD1; putative long-chain fatty-acid--CoA ligase [KO:K01897] [EC:6.2.1.3]
            KSE_17190  fadD3; putative long-chain fatty-acid--CoA ligase [KO:K01897] [EC:6.2.1.3]
            KSE_73410  bfmM; acyl-CoA ligase [KO:K01897] [EC:6.2.1.3]
            KSE_62790  fadD2; putative long-chain fatty-acid--CoA ligase [KO:K01897] [EC:6.2.1.3]
            KSE_13030  hypothetical protein [KO:K01897] [EC:6.2.1.3]
COMPOUND    C00024  Acetyl-CoA
            C00083  Malonyl-CoA
            C00154  Palmitoyl-CoA
            C00229  Acyl-carrier protein
            C00249  Hexadecanoic acid
            C00712  (9Z)-Octadecenoic acid
            C01203  Oleoyl-[acyl-carrier protein]
            C01209  Malonyl-[acyl-carrier protein]
            C01530  Octadecanoic acid
            C01571  Decanoic acid
            C02679  Dodecanoic acid
            C03939  Acetyl-[acyl-carrier protein]
            C04088  Octadecanoyl-[acyl-carrier protein]
            C04180  cis-Dec-3-enoyl-[acp]
            C04246  But-2-enoyl-[acyl-carrier protein]
            C04618  (3R)-3-Hydroxybutanoyl-[acyl-carrier protein]
            C04619  (3R)-3-Hydroxydecanoyl-[acyl-carrier protein]
            C04620  (3R)-3-Hydroxyoctanoyl-[acyl-carrier protein]
            C04633  (3R)-3-Hydroxypalmitoyl-[acyl-carrier protein]
            C04688  (3R)-3-Hydroxytetradecanoyl-[acyl-carrier protein]
            C05223  Dodecanoyl-[acyl-carrier protein]
            C05744  Acetoacetyl-[acp]
            C05745  Butyryl-[acp]
            C05746  3-Oxohexanoyl-[acp]
            C05747  (R)-3-Hydroxyhexanoyl-[acp]
            C05748  trans-Hex-2-enoyl-[acp]
            C05749  Hexanoyl-[acp]
            C05750  3-Oxooctanoyl-[acp]
            C05751  trans-Oct-2-enoyl-[acp]
            C05752  Octanoyl-[acp]
            C05753  3-Oxodecanoyl-[acp]
            C05754  trans-Dec-2-enoyl-[acp]
            C05755  Decanoyl-[acp]
            C05756  3-Oxododecanoyl-[acp]
            C05757  (R)-3-Hydroxydodecanoyl-[acp]
            C05758  trans-Dodec-2-enoyl-[acp]
            C05759  3-Oxotetradecanoyl-[acp]
            C05760  trans-Tetradec-2-enoyl-[acp]
            C05761  Tetradecanoyl-[acp]
            C05762  3-Oxohexadecanoyl-[acp]
            C05763  trans-Hexadec-2-enoyl-[acp]
            C05764  Hexadecanoyl-[acp]
            C06423  Octanoic acid
            C06424  Tetradecanoic acid
            C08362  (9Z)-Hexadecenoic acid
            C16219  3-Oxostearoyl-[acp]
            C16220  (R)-3-Hydroxyoctadecanoyl-[acp]
            C16221  (2E)-Octadecenoyl-[acp]
            C16520  Hexadecenoyl-[acyl-carrier protein]
            C20794  n-7 Unsaturated acyl-[acyl-carrier protein]
REFERENCE   PMID:12061798
  AUTHORS   Salas JJ, Ohlrogge JB.
  TITLE     Characterization of substrate specificity of plant FatA and FatB acyl-ACP thioesterases.
  JOURNAL   Arch Biochem Biophys 403:25-34 (2002)
            DOI:10.1016/S0003-9861(02)00017-6
REFERENCE   PMID:12518017
  AUTHORS   Zhang YM, Marrakchi H, White SW, Rock CO.
  TITLE     The application of computational methods to explore the diversity and structure of bacterial fatty acid synthase.
  JOURNAL   J Lipid Res 44:1-10 (2003)
            DOI:10.1194/jlr.R200016-JLR200
REFERENCE   PMID:11337402
  AUTHORS   Voelker T, Kinney AJ.
  TITLE     VARIATIONS IN THE BIOSYNTHESIS OF SEED-STORAGE LIPIDS.
  JOURNAL   Annu Rev Plant Physiol Plant Mol Biol 52:335-361 (2001)
            DOI:10.1146/annurev.arplant.52.1.335
REFERENCE   PMID:17573542
  AUTHORS   Barker GC, Larson TR, Graham IA, Lynn JR, King GJ.
  TITLE     Novel insights into seed fatty acid synthesis and modification pathways from genetic diversity and quantitative trait Loci analysis of the Brassica C genome.
  JOURNAL   Plant Physiol 144:1827-42 (2007)
            DOI:10.1104/pp.107.096172
KO_PATHWAY  ko00061
///

Retrieving pathway components

As you can see from the database entry for ksk00061 above, the pathway is composed of many GENE and COMPOUND entries, but the returned data format is not easy to work with to extract that data.

You can use the `kegg_link()` function to identify the components of a pathway, by specifying first the `` you want to make a connection to, then the `` for the database entry you are interested in:
result = REST.kegg_link(<database>, <entry>).read()

For instance, to identify the COMPOUND entries represented in the map00061 pathway, you would compose the query:

result = REST.kegg_link("compound", "map00061").read()

as below:

In [23]:
# Get genes involved with fatty-acid biosynthesis in Kitasatospora
result = REST.kegg_link("compound", "map00061").read()
to_df(result)
Out[23]:
0 1
0 path:map00061 cpd:C00024
1 path:map00061 cpd:C00083
2 path:map00061 cpd:C00154
3 path:map00061 cpd:C00229
4 path:map00061 cpd:C00249
5 path:map00061 cpd:C00712
6 path:map00061 cpd:C01203
7 path:map00061 cpd:C01209
8 path:map00061 cpd:C01530
9 path:map00061 cpd:C01571
10 path:map00061 cpd:C02679
11 path:map00061 cpd:C03939
12 path:map00061 cpd:C04088
13 path:map00061 cpd:C04180
14 path:map00061 cpd:C04246
15 path:map00061 cpd:C04618
16 path:map00061 cpd:C04619
17 path:map00061 cpd:C04620
18 path:map00061 cpd:C04633
19 path:map00061 cpd:C04688
20 path:map00061 cpd:C05223
21 path:map00061 cpd:C05744
22 path:map00061 cpd:C05745
23 path:map00061 cpd:C05746
24 path:map00061 cpd:C05747
25 path:map00061 cpd:C05748
26 path:map00061 cpd:C05749
27 path:map00061 cpd:C05750
28 path:map00061 cpd:C05751
29 path:map00061 cpd:C05752
30 path:map00061 cpd:C05753
31 path:map00061 cpd:C05754
32 path:map00061 cpd:C05755
33 path:map00061 cpd:C05756
34 path:map00061 cpd:C05757
35 path:map00061 cpd:C05758
36 path:map00061 cpd:C05759
37 path:map00061 cpd:C05760
38 path:map00061 cpd:C05761
39 path:map00061 cpd:C05762
40 path:map00061 cpd:C05763
41 path:map00061 cpd:C05764
42 path:map00061 cpd:C06423
43 path:map00061 cpd:C06424
44 path:map00061 cpd:C08362
45 path:map00061 cpd:C16219
46 path:map00061 cpd:C16220
47 path:map00061 cpd:C16221
48 path:map00061 cpd:C16520
49 path:map00061 cpd:C20794

You can use any of the databases in KEGG with this function, though not all may give you a result for any given query.

You can use this function to query generic pathways against the very useful reference databases of KEGG:

  • ko: KEGG orthologues - a collection of functional orthologues
  • ec: EC numbers - a collection of Enzyme Commission classifications
  • rn: REACTION entries - descriptions of chemical interconversions

For example, to identify reactions that are involved in the fatty-acid synthesis pathway, and then get the database entry for one of these, you could use the queries in the cells below:

In [24]:
# Get reactions involved with fatty-acid biosynthesis
result = REST.kegg_link("rn", "map00061").read()
to_df(result)
Out[24]:
0 1
0 path:map00061 rn:R00742
1 path:map00061 rn:R01280
2 path:map00061 rn:R01624
3 path:map00061 rn:R01626
4 path:map00061 rn:R01706
5 path:map00061 rn:R02814
6 path:map00061 rn:R03370
7 path:map00061 rn:R04014
8 path:map00061 rn:R04355
9 path:map00061 rn:R04428
10 path:map00061 rn:R04429
11 path:map00061 rn:R04430
12 path:map00061 rn:R04533
13 path:map00061 rn:R04534
14 path:map00061 rn:R04535
15 path:map00061 rn:R04536
16 path:map00061 rn:R04537
17 path:map00061 rn:R04543
18 path:map00061 rn:R04544
19 path:map00061 rn:R04566
20 path:map00061 rn:R04568
21 path:map00061 rn:R04724
22 path:map00061 rn:R04725
23 path:map00061 rn:R04726
24 path:map00061 rn:R04952
25 path:map00061 rn:R04953
26 path:map00061 rn:R04954
27 path:map00061 rn:R04955
28 path:map00061 rn:R04956
29 path:map00061 rn:R04957
30 path:map00061 rn:R04958
31 path:map00061 rn:R04959
32 path:map00061 rn:R04960
33 path:map00061 rn:R04961
34 path:map00061 rn:R04962
35 path:map00061 rn:R04963
36 path:map00061 rn:R04964
37 path:map00061 rn:R04965
38 path:map00061 rn:R04966
39 path:map00061 rn:R04967
40 path:map00061 rn:R04968
41 path:map00061 rn:R04969
42 path:map00061 rn:R04970
43 path:map00061 rn:R07639
44 path:map00061 rn:R07762
45 path:map00061 rn:R07763
46 path:map00061 rn:R07764
47 path:map00061 rn:R07765
48 path:map00061 rn:R08157
49 path:map00061 rn:R08158
50 path:map00061 rn:R08159
51 path:map00061 rn:R08161
52 path:map00061 rn:R08162
53 path:map00061 rn:R08163
54 path:map00061 rn:R10700
55 path:map00061 rn:R10707
56 path:map00061 rn:R10714
In [25]:
# Get reactions R00742
result = REST.kegg_get("R00742").read()
print(result)
ENTRY       R00742                      Reaction
NAME        acetyl-CoA:carbon-dioxide ligase (ADP-forming)
DEFINITION  ATP + Acetyl-CoA + HCO3- <=> ADP + Orthophosphate + Malonyl-CoA
EQUATION    C00002 + C00024 + C00288 <=> C00008 + C00009 + C00083
COMMENT     two-step reaction (see R04385 + R04386)
RCLASS      RC00002  C00002_C00008
            RC00040  C00024_C00083
            RC00367  C00083_C00288
ENZYME      6.4.1.2
PATHWAY     rn00061  Fatty acid biosynthesis
            rn00254  Aflatoxin biosynthesis
            rn00620  Pyruvate metabolism
            rn00640  Propanoate metabolism
            rn00720  Carbon fixation pathways in prokaryotes
            rn01100  Metabolic pathways
            rn01110  Biosynthesis of secondary metabolites
            rn01120  Microbial metabolism in diverse environments
            rn01130  Biosynthesis of antibiotics
            rn01200  Carbon metabolism
            rn01212  Fatty acid metabolism
MODULE      M00082  Fatty acid biosynthesis, initiation
            M00375  Hydroxypropionate-hydroxybutylate cycle
            M00376  3-Hydroxypropionate bi-cycle
ORTHOLOGY   K01946  acetyl-CoA carboxylase / biotin carboxylase 2 [EC:6.4.1.2 6.3.4.14 2.1.3.15]
            K01961  acetyl-CoA carboxylase, biotin carboxylase subunit [EC:6.4.1.2 6.3.4.14]
            K01962  acetyl-CoA carboxylase carboxyl transferase subunit alpha [EC:6.4.1.2 2.1.3.15]
            K01963  acetyl-CoA carboxylase carboxyl transferase subunit beta [EC:6.4.1.2 2.1.3.15]
            K01964  acetyl-CoA/propionyl-CoA carboxylase [EC:6.4.1.2 6.4.1.3]
            K02160  acetyl-CoA carboxylase biotin carboxyl carrier protein
            K11262  acetyl-CoA carboxylase / biotin carboxylase 1 [EC:6.4.1.2 6.3.4.14 2.1.3.15]
            K11263  acetyl-CoA/propionyl-CoA carboxylase, biotin carboxylase, biotin carboxyl carrier protein [EC:6.4.1.2 6.4.1.3 6.3.4.14]
            K15036  acetyl-CoA/propionyl-CoA carboxylase [EC:6.4.1.2 6.4.1.3 2.1.3.15]
            K15037  biotin carboxyl carrier protein
            K18472  acetyl-CoA/propionyl-CoA carboxylase carboxyl transferase subunit [EC:6.4.1.2 6.4.1.3 2.1.3.15]
            K18603  acetyl-CoA/propionyl-CoA carboxylase [EC:6.4.1.2 6.4.1.3]
            K18604  acetyl-CoA/propionyl-CoA carboxylase [EC:6.4.1.2 6.4.1.3 2.1.3.15]
            K18605  biotin carboxyl carrier protein
DBLINKS     RHEA: 11311
///

Exercise 01 (15min)

The UniProt record Q05655 describes a human protein kinase. Using KEGG, can you discover:



  • Which genes are associated with this UniProt entry?