Supplementary information for pyani analyses reported in Hugouvieux-Cotte-Pattat et al. (2021) IJSEM, describing the novel genus Paradisiaca
View the Project on GitHub widdowquinn/SI_Hugouvieux-Cotte-Pattat_2021
We conducted whole-genome classification and a multigene maximum-likelihood phylogenetic reconstruction on a set of 49 genomes spanning five genera in the Pectobacteriaceae (including Musicola) to establish relative placement of the proposed Musicola genus. The genomes used were:
Each genome was downloaded from NCBI with ncbi-genome-download
v0.3.0 (https://github.com/kblin/ncbi-genome-download/) using the accession ID as identified. To ensure consistency of annotation between genomes, all sequences were reannotated using prodigal
v2.6.3 (Hyatt et al. 2010) to obtain the predicted proteome.
Whole-genome classification of the 49 genomes was performed using pyani
v0.3.0b (Pritchard et al. 2016) and the ANIm algorithm. Taking 94-96% identity as an approximate threshold corresponding to species division, and 40-50% coverage as an approximate threshold corresponding to genus division, the results support the following eight genus divisions:
and 22 species divisions:
A total of 1201 single-copy orthologues were identified as present in the predicted proteomes (amino acid sequences) of all 49 genomes, using orthofinder
v2.5.2 (Emms & Kelly 2019). The protein sequences for each of the 1201 genes were aligned using MAFFT
v7.480 (Nakamura et al. 2018) and the corresponding CDS sequences threaded onto these alignments using t-coffee
v12.00.7fb08c2 (Notredame et al. 2000). The nucleotide alignments were concatenated into a single sequence per genome using the Python script concatenate_cds.py
, which also generated a partition file (one partition per gene) for the subsequent maximum-likelihood phylogenetic reconstruction.
The concatenated nucleotide sequence alignment of 1201 single-copy orthologues and corresponding partition file were used as input to raxml-ng
v1.0.2 (Kozlov et al. 2019). Initial processing with raxml-ng
recommended the GTR+F0+G4m+B model for each of the 1201 genes, and the partition file was used to allow individual parameterisation of this model for each gene. A single topology was found for all 20 trees, suggesting that this was the globally-optimal topology. One hundred bootstrap replicate trees were determined to estimate support values for each tree partition; MRE-based bootstoppiing indicated that convergence was reached with only 50 replicates. The best estimate from 20 starting trees was midpoint-rooted, manually annotated and coloured using figtree
v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).
The resulting reconstruction supports the same genus and species divisions noted above for whole-genome classification using pyani
.
Both ANIm and a comprehensive multigene phylogeny support the same genus and species divisions, including establishment of Musicola as a novel genus.
We note in passing that these approaches also support division of Brenneria into multiple genus-level groups, establishment of a further genus-level group circumscribing genomes currently described as D. aquatica and D. lacustris, and reassignment of members of L. quercina.
Emms, D.M. and Kelly, S. (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biology 20:238
Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010 Mar 8;11:119. doi: 10.1186/1471-2105-11-119. PMID: 20211023; PMCID: PMC2848648.
Alexey M. Kozlov, Diego Darriba, Tomáš Flouri, Benoit Morel, and Alexandros Stamatakis (2019) RAxML-NG: A fast, scalable, and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics, btz305 doi:10.1093/bioinformatics/btz305
Nakamura, Yamada, Tomii, Katoh 2018 (Bioinformatics 34:2490–2492) Parallelization of MAFFT for large-scale multiple sequence alignments. (describes MPI parallelization of accurate progressive options)
Notredame, Higgins, Heringa 2000 T-Coffee: A novel method for multiple sequence alignments. JMB, 302(205-217)
Pritchard et al. (2016) “Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens” Anal. Methods, 2016, 8, 12-24 DOI: 10.1039/C5AY02550H