SI_Hugouvieux-Cotte-Pattat_2021

Supplementary information for pyani analyses reported in Hugouvieux-Cotte-Pattat et al. (2021) IJSEM, describing the novel genus Paradisiaca

View the Project on GitHub widdowquinn/SI_Hugouvieux-Cotte-Pattat_2021

Phylogenomic analysis supporting establishment of Musicola gen. nov.

We conducted whole-genome classification and a multigene maximum-likelihood phylogenetic reconstruction on a set of 49 genomes spanning five genera in the Pectobacteriaceae (including Musicola) to establish relative placement of the proposed Musicola genus. The genomes used were:

Each genome was downloaded from NCBI with ncbi-genome-download v0.3.0 (https://github.com/kblin/ncbi-genome-download/) using the accession ID as identified. To ensure consistency of annotation between genomes, all sequences were reannotated using prodigal v2.6.3 (Hyatt et al. 2010) to obtain the predicted proteome.

Whole-Genome Classification

Whole-genome classification of the 49 genomes was performed using pyani v0.3.0b (Pritchard et al. 2016) and the ANIm algorithm. Taking 94-96% identity as an approximate threshold corresponding to species division, and 40-50% coverage as an approximate threshold corresponding to genus division, the results support the following eight genus divisions:

  1. Dickeya (D. solani, D. dadantii, D. fangzhongai, D. undicola, D. dianthicola, D. paceiphila, D. zeae, D. chrysanthemi)
  2. Musicola (M. paradisiaca, M. keenii, formerly D. paradisiaca)
  3. Gen. nov. I (D. aquatica, D. lacustris)
  4. Lonsdalea (L. iberica, L. quercina, L. britannica)
  5. Pectobacterium (P. atrosepticum, P. wasabiae, P. parvum)
  6. Gen. nov. II (B. roseae)
  7. Gen. nov. III (B. alni)
  8. Gen. nov. IV (B. goodwinii)

and 22 species divisions:

  1. D. undicola
  2. D. dianthicola
  3. D. fangzhongdai
  4. D. solani
  5. D. dadantii
  6. D. zeae
  7. D. poaceiphila
  8. D. chrysanthemi
  9. P. parvum
  10. P. wasabiae
  11. P. atrosepticum
  12. M. paradisiaca
  13. M. keenii
  14. L. quercina ATCC 29281
  15. L. iberica
  16. L. sp. nov. (currently L. quercina CFCC 11059, L. quercina CFCC 13731)
  17. L. britannica
  18. B. goodwinii
  19. Gen. nov. I sp. nov. I (currently D. aquatica)
  20. Gen. nov. I sp. nov. II (currently D. lacustris)
  21. B. alni
  22. B. roseae

Multigene Phylogenomics

A total of 1201 single-copy orthologues were identified as present in the predicted proteomes (amino acid sequences) of all 49 genomes, using orthofinder v2.5.2 (Emms & Kelly 2019). The protein sequences for each of the 1201 genes were aligned using MAFFT v7.480 (Nakamura et al. 2018) and the corresponding CDS sequences threaded onto these alignments using t-coffee v12.00.7fb08c2 (Notredame et al. 2000). The nucleotide alignments were concatenated into a single sequence per genome using the Python script concatenate_cds.py, which also generated a partition file (one partition per gene) for the subsequent maximum-likelihood phylogenetic reconstruction.

Maximum-Likelihood Phylogenetic Reconstruction

The concatenated nucleotide sequence alignment of 1201 single-copy orthologues and corresponding partition file were used as input to raxml-ng v1.0.2 (Kozlov et al. 2019). Initial processing with raxml-ng recommended the GTR+F0+G4m+B model for each of the 1201 genes, and the partition file was used to allow individual parameterisation of this model for each gene. A single topology was found for all 20 trees, suggesting that this was the globally-optimal topology. One hundred bootstrap replicate trees were determined to estimate support values for each tree partition; MRE-based bootstoppiing indicated that convergence was reached with only 50 replicates. The best estimate from 20 starting trees was midpoint-rooted, manually annotated and coloured using figtree v1.4.4 (http://tree.bio.ed.ac.uk/software/figtree/).

The resulting reconstruction supports the same genus and species divisions noted above for whole-genome classification using pyani.

Conclusions

Both ANIm and a comprehensive multigene phylogeny support the same genus and species divisions, including establishment of Musicola as a novel genus.

We note in passing that these approaches also support division of Brenneria into multiple genus-level groups, establishment of a further genus-level group circumscribing genomes currently described as D. aquatica and D. lacustris, and reassignment of members of L. quercina.

References

Emms, D.M. and Kelly, S. (2019) OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biology 20:238

Hyatt D, Chen GL, Locascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. 2010 Mar 8;11:119. doi: 10.1186/1471-2105-11-119. PMID: 20211023; PMCID: PMC2848648.

Alexey M. Kozlov, Diego Darriba, Tomáš Flouri, Benoit Morel, and Alexandros Stamatakis (2019) RAxML-NG: A fast, scalable, and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics, btz305 doi:10.1093/bioinformatics/btz305

Nakamura, Yamada, Tomii, Katoh 2018 (Bioinformatics 34:2490–2492) Parallelization of MAFFT for large-scale multiple sequence alignments. (describes MPI parallelization of accurate progressive options)

Notredame, Higgins, Heringa 2000 T-Coffee: A novel method for multiple sequence alignments. JMB, 302(205-217)

Pritchard et al. (2016) “Genomics and taxonomy in diagnostics for food security: soft-rotting enterobacterial plant pathogens” Anal. Methods, 2016, 8, 12-24 DOI: 10.1039/C5AY02550H