06/06/2017
I'm not really a computer scientist…
Learning on the job…
Vision
Mission
Group Objective
@huttonics
Soft rot enterobacteria
Potato late blight
The mechanics are complex: systems-level approaches required
Prevailing interaction model: "Zig-Zag"
Quantitative, specific timescales and molecular interactions
Quantitative, specific timescales and molecular interactions
Quantitative, specific timescales and molecular interactions
published 2009, focus on pathogen effectors
is a supervised (machine) learning problem
effectors are modular (address and payload)
RxLR
motif characteristic sequence
published 2011, focus on R-genes (NB-LRR)
≈488 candidates identified, 366 placed on genome
MEME
\(\rightarrow\) 20 characteristic domains; MAST
search of genome
commercial gene enrichment bead 'array', based on NB-LRR model
16S central to microbial ecology (microbiome)
rDNA flanking regions differ within a genome, not between genomes
artifical chromosome: spaced E.coli rDNA clusters
most rDNA clusters assembled when ref. mutation rate \(\leq\) 0.03
hybrid (PacBio/Illumina assembly reference)
Closed scaffold of Illumina-only Staphylococcus aureus UAM-1 draft genome
future directions
riboSeed on GitHub: https://nickp60.github.io/riboSeed/
Historical classification mostly phenotypic, polyphasic
European and Mediterranean Plant Protection Organisation (EPPO)
Seed Potatoes (Scotland) Amendment Regulations (2010)
consortium for control and epidemiology
Easy to incorporate into legislation (binary classification)
Essentially a data structure problem!
To legislate effectively, must discriminate and identify the pathogen
Primer3
)The first design run could not predict diagnostic primers!
Real-world impacts of misclassification
≈18% of genomes in public databases misclassified by species
Designed primers that discriminate at species level across Dickeya
Also designed primers that discriminate RxLR variants (population surveys)
Designed primers at subserotype level for E. coli O104:H4 outbreak
Genomics has transformed outbreak detection and prediction
"Gold standard" whole-genome classification since 1960s
Whole-genome sequence replacement for DDH
pyani
python
package and scripts for ANI
PyPI
ANIm %ID indicates reclassification of Dickeya.
ANIm %ID indicates reclassification of Pectobacterium.
ANIm %coverage: all OK for Pectobacterium spp.
ANIm %coverage highlights an issue
Increasing interest in whole-genome classification
ANIm of all sequenced SRE genomes. Edges > 50% coverage
cliques - k-complete graphs - are 'natural' clusterings
at some %identity values, all graph components are cliques
JHI holds historical pathogen samples from 1950s onwards
Sequenced ≈50 P. atrosepticum isolates from infections (2009-2015)
prokka
, roary
, QUAST
, parSNP
SNPs widespread across all P. atrosepticum genomes
parSNP
CladesP. atrosepticum divisible into four clades
all four clades widespread, no obvious geographical pattern
15% of P. atrosepticum genes are 'accessory'
accessory gene tree not congruent with the SNP tree