00 - The Challenge¶

Motivation¶

You work at Consolidated Incorporated, a producer of biodiesels: methyl esters of fatty acids that are usually produced by base-catalyzed transesterification of triacylglyerol with methanol.

Some lipase enzymes are effective catalysts for biodiesel synthesis. Natural lipases are often rapidly inactivated by the high methanol concentrations used for biodiesel synthesis, limiting their practical use. The Biochemistry Unit here at Consolidated Incorporated have identified a lipase from Proteus mirabilis that is a particularly promising catalyst for biodiesel synthesis as it produces high yields of methyl esters even in the presence of large amounts of water and expresses very well in Escherichia coli.

However, since the Proteus mirabilis lipase is only moderately stable and methanol tolerant, these properties needed to be improved before the enzyme can be used industrially. The Product Development Unit have used directed evolution to generate a modified enzyme that has greatly improved thermal stability, dramatically increased methanol tolerance, and retains the ability to synthesize biodiesel

The Product Development Unit have asked you to investigate the publicly-available information available about this enzyme and/or its near relatives, with a specific remit to report and comment on the diversity of sequence and function of this enzyme's relatives, and to investigate what features of the modified enzyme explain might its improved performance.

Activities¶

You are provided with the FASTA nucleotide sequences for the wild-type P. mirabilis lipase, and for an engineered form of this protein.

WT: data/wildtype_nt.fasta
Engineered: data/engineered_nt.fasta

Your manager has asked you to produce a Jupyter notebook, combining analysis and report in one document, taking these sequences through the following processes:

mapping/searching for a CDS on a genome
- working with sequences in Biopython
- identifying a gene model
- identifying a putative regulatory region
identifying homologues in public databases
- BLAST/NCBI
- UniProt
- KEGG
- generating an orthologue/homologue set
- identifying sites of conservation/diversity
identifying related structural data
- RCSB search
- JMOL
- JPRED
- NetSurfP
- SwissModel
comparing engineered to wild type sequence
- biological/functional interpretation from sequence data

Sequences¶

The nucleotide sequences provided by the Product Development Unit are located at:

wildtype: data/wildtype_nt.fasta
engineered variant: data/engineered_nt.fasta

A reference GenBank assembly for Proteus mirabilis is provided at:

data/GCA_000069965.1_ASM6996v1_genomic.fna

with feature annotations in GenBank format at:

data/GCA_000069965.1_ASM6996v1_genomic.gbk

and annotated CDS sequences and their translations at:

data/GCA_000069965.1_ASM6996v1_cds_from_genomic.fna
data/GCA_000069965.1_ASM6996v1_protein.faa