You work at Consolidated Incorporated, a producer of biodiesels: methyl esters of fatty acids that are usually produced by base-catalyzed transesterification of triacylglyerol with methanol.
Some lipase enzymes are effective catalysts for biodiesel synthesis. Natural lipases are often rapidly inactivated by the high methanol concentrations used for biodiesel synthesis, limiting their practical use. The Biochemistry Unit here at Consolidated Incorporated have identified a lipase from Proteus mirabilis that is a particularly promising catalyst for biodiesel synthesis as it produces high yields of methyl esters even in the presence of large amounts of water and expresses very well in Escherichia coli.
However, since the Proteus mirabilis lipase is only moderately stable and methanol tolerant, these properties needed to be improved before the enzyme can be used industrially. The Product Development Unit have used directed evolution to generate a modified enzyme that has greatly improved thermal stability, dramatically increased methanol tolerance, and retains the ability to synthesize biodiesel
The Product Development Unit have asked you to investigate the publicly-available information available about this enzyme and/or its near relatives, with a specific remit to report and comment on the diversity of sequence and function of this enzyme's relatives, and to investigate what features of the modified enzyme explain might its improved performance.
You are provided with the FASTA nucleotide sequences for the wild-type P. mirabilis lipase, and for an engineered form of this protein.
Your manager has asked you to produce a Jupyter notebook, combining analysis and report in one document, taking these sequences through the following processes:
mapping/searching for a CDS on a genome
identifying homologues in public databases
identifying related structural data
comparing engineered to wild type sequence
The nucleotide sequences provided by the Product Development Unit are located at:
data/wildtype_nt.fasta
data/engineered_nt.fasta
A reference GenBank assembly for Proteus mirabilis is provided at:
data/GCA_000069965.1_ASM6996v1_genomic.fna
with feature annotations in GenBank format at:
data/GCA_000069965.1_ASM6996v1_genomic.gbk
and annotated CDS sequences and their translations at:
data/GCA_000069965.1_ASM6996v1_cds_from_genomic.fna
data/GCA_000069965.1_ASM6996v1_protein.faa