|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Medical Research Council Virology Unit, Institute of Virology, University of Glasgow, Church Street, Glasgow G11 5JR, UK
Correspondence
Duncan J. McGeoch
d.mcgeoch{at}mrcvu.gla.ac.uk
| ABSTRACT |
|---|
|
|
|---|
Supplementary material describing the production of the optimized colouring schemes for VZV UL SNPs is available with the online version of this paper.
| INTRODUCTION |
|---|
|
|
|---|
|
|
| METHODS |
|---|
|
|
|---|
| RESULTS |
|---|
|
|
|---|
Of the 494 SNPs, 281 represent loci at which one sequence differs from the other 17. Such singleton diversity will in most cases define the extent to which a single isolate has changed since the line leading to it diverged from the most closely related line or lines (a terminal branch that contained the tree's root would constitute a special case, which I consider does not occur in the VZV dataset). Singleton SNPs are not of primary relevance for analysing deeper relationships among lineages. Alleles for the 213 non-singleton SNPs are illustrated in Fig. 2(a)
, in which each SNP locus in each strain was coloured to indicate identity or non-identity to the corresponding locus in the Dumas strain. Inspection showed that 16 of the strains fall easily into three major groups. The uppermost ten sequences as listed in Fig. 2(a)
are close to the Dumas strain, the next four are close to HJO, and the next two (strains DR and 8) form a third group. The last two sequences, for the pOKA and CA123 strains, have distinct SNP patterns and were each treated as a separate group. The five groups defined in this way coincide with groups that had previously been recognized through tree-building comparisons of VZV genomic sequences (Norberg et al., 2006
; Peters et al., 2006
; Loparev et al., 2007
). Several nomenclatures for these groups have been used in published papers and that of Loparev et al. (2007)
is followed here, as listed in Table 1
.
|
Analysis of VZV genomic lineages and crossovers
It is apparent from Fig. 2(b)
that over the cross-group set there exist regions both of extended identity and extended difference between groups, and that patterns of inter-group relationships vary along the genome. I wished to interpret the cross-group SNP set in terms of underlying lineages and of recombinational breakpoints assigned to particular groups. Before describing how this was approached, some comments on the dataset are needed. The SNPs are a binary dataset, so that at most two different lineages can be presented as each being uniformly coloured across a given region. It is implicit that at each SNP locus one allele represents the ancestral state and the other a substitution mutation. Since there are five groups and the dataset was constructed to exclude singletons, every cross-group SNP locus must show one allele in three groups and the other in two groups (excepting the few that show intra-group variability).
Colouring SNPs by reference to any one strain is of course an arbitrary device (and indeed the colouring scheme in Fig. 2(b)
is spuriously suggestive of a notably unlikely scenario: it gives the E2 and M2 groups the appearance of being complex reciprocal recombinants). I attempted to derive a colouring scheme that optimized visible continuity of lineages in terms of extended identity or difference between groups' SNPs. This exercise focused on UL, since the ten cross-group SNPs associated with IRS and US showed high intra-group variation. In order to facilitate recognition of patterns in the array, the SNP set for the five groups was converted to binary form with alleles at each locus coded as 0 and 1 (equivalently to the colouring in Fig. 2b
). Inspection showed that the M1 group was an outlier in terms of defining patterns at single SNP loci that were present at high frequency. Single loci patterns were then examined for the other four groups, showing that the majority of loci for E1 were identical to those of E2 and most of those of M2 were identical to J. The SNP loci that differed from the majority pattern were individually tested for whether reversing their assignments of 0 and 1 symbols would improve net continuity of coding with the neighbouring SNPs, as summed over all five groups. A fuller account is given in the supplementary material available with the online version of this paper. Two optimized colouring schemes were obtained, which differ only in the region of SNPs 47–54; these are shown in Fig. 3(a and b)
. This procedure reduced the number of colour-change boundaries along the UL cross-group SNP array, summed over the five groups (with intra-group variations averaged), from 95 for Dumas-relative colouring to 65 for both optimized schemes. The version in Fig. 3(b)
is used for further discussion because of the general uniformity of colouring it achieves within both E2 and M2 groups (while noting that this was not a planned criterion in the optimization protocol). Fig. 3(c)
shows how the scheme of Fig. 3(b)
transfers to a genomic map of SNP loci.
|
Representation of phylogenetic relationships among VZV UL sequences
In no case do the SNPs that define variation within a multi-member group show any substantive indication of recombination, and a valid phylogenetic tree for each group can thus be made showing relationships among members. However, given the clear indication that crossovers were involved in the genesis of at least some of these contemporary groups, a standard phylogenetic tree that includes all groups would not provide an accurate representation of deep relationships among groups. In this section I use the analyses of SNP data described above to reach a representation that takes recombinational events into account, based on the UL sequences.
Since the UL cross-group SNPs for the E2 and M2 groups are almost completely distinct, a tree based on only these two groups should be nearly free of any complications due to recombination. Trees were derived using the complete UL alignments for the E2 and M2 strains. Initial work used maximum-parsimony programs, which allowed the contributions of the several subsets of SNPs to be readily distinguished, and this was then supplemented by the computationally more sophisticated BMCMC approach. The two methods gave effectively identical results. Because diversity in the genomic alignment is low, allowance for multiple substitutions made by the BMCMC computation is small; the overall branch length of the BMCMC tree was less than 2 % greater than that of its maximum-parsimony counterpart. The tree for E2 plus M2 from BMCMC is shown in Fig. 4
, with the section that represents divergence attributable to cross-group SNPs drawn in red. In the case of the E1 group, a tree was derived using the complete E1 and M2 UL alignments, and treating M2 as outgroup for E1. This E1 tree is valid for the portion corresponding to the E1-group-specific SNPs; that is, for descent from the earliest group-specific ancestor. However, since it appears that the lineage ancestral to E1 arose by recombination events involving the E2 and M2 lineages, the deeper history of the E1 group can only be indicated by sketched connections to those lineages, as depicted in Fig. 4
. It is to be noted that the recombinational genesis of the E1 line evidently took place later than most of the substitutional divergence events defined by the cross-group SNP configurations of the E2 and M2 groups. Lastly, the single-member J and M1 groups each possess unique lineages of substantial relative depth, as indicated by their singleton SNP complements, while the inferred existence of crossover events obscures details of their deeper connections (Fig. 4
).
|
| DISCUSSION |
|---|
|
|
|---|
For the work reported in this paper, it was of particular significance (and a powerfully simplifying circumstance for analysis of deep relationships) that examination of SNP patterns identified the same major groups as had earlier emerged from tree-building. This indicated that the incidence of recombination has been sufficiently low relative to that of nucleotide substitution to allow development of distinct lineages, at least for the UL region, which represents close to 90 % of non-repeated genomic sequence and contributed 87 % of total SNPs in the dataset. Nonetheless, it is clear that genomes constituting recombinant forms between two distinct lineages have arisen. The analysis of possible crossover points described in this paper depended on optimizing continuity of SNP associations over groups, and then locating breakpoints in such associations. The crossovers that were clearly demonstrated in this way are supported by runs of SNPs which are sizeable with respect to overall diversity, and there are also other possible crossover events which are indicated by only one or two SNPs.
Sequence evolution in the S region, especially RS, appears to be more dynamic than that in UL. The incidence of SNPs in RS is 1.5 times higher than in UL (as seen in Fig. 1b
), and the occurrence of SNPs that define diversity within each multi-member group is more marked in both RS and US than in UL (Fig. 2a
). RS is also distinctive in that its G+C content is much higher than that of the unique sequences (59 %, versus 44 and 43 % for UL and US, respectively). Such elevated G+C contents are a general feature of herpesviral genomic repeat regions, and are considered likely to result from mutations introduced or fixed during processes of recombination between repeat copies, perhaps involving biased gene conversion (McGeoch et al., 1986
, 2008
). I interpret the S region's SNP pattern as probably indicative of higher levels, relative to UL, of both point mutation and recombination; since the SNP sets for RS and US are small, no further analysis was attempted. It is worth noting that, in crossovers between distinct VZV strains, events involving RS could well be a favoured class, given the recombinational activity between RS copies in single-strain infection.
Studies of VZV geographical variation have been carried out with strategies that sample genome sequences (for instance Barrett Muir et al., 2002
; Quinlivan et al., 2002
), but available whole genome sequences apparently represent predominantly isolates from Caucasian populations (North American and European). It is striking that two deep lineages (J and M1) are presently represented by just one genome sequence each. The pOKA strain (parent of the OKA vaccine) which constitutes the J group is of Japanese origin, and may be representative of a still largely unexplored subset of VZV strains. It is also noteworthy that no complete VZV genome sequence, to the best of my knowledge, is considered to represent a sub-Saharan African isolate. Taking all into account, it is quite likely that further, uncharacterized deep lineages of VZV exist.
The diversity of VZV genomic sequences is recognized as being low compared with that seen in other human herpesviruses (Quinlivan & Breuer, 2006
). For instance, diversity among genome sequences of herpes simplex virus type 1 is some sixfold higher than that in VZV (results not shown). The low diversity of VZV could result from one or both of two factors: lower rate of genomic change and lesser elapsed time since the most recent common ancestor of the characterized strains (discussed by Barrett Muir et al., 2002
). Addressing such issues may become more feasible with the outline understanding we now have of lineages and recombination.
| ACKNOWLEDGEMENTS |
|---|
| REFERENCES |
|---|
|
|
|---|
Davison, A. J. (1984). Structure of the genome termini of varicella-zoster virus. J Gen Virol 65, 1969–1977.
Davison, A. J. & Scott, J. E. (1986). The complete DNA sequence of varicella-zoster virus. J Gen Virol 67, 1759–1816.
Dumas, A. M., Geelen, J. L. M. C., Weststrate, M. W., Wertheim, P. & van der Noordaa, J. (1981). XbaI, PstI, and BglII restriction enzyme maps of the two orientations of the varicella-zoster virus genome. J Virol 39, 390–400.
Felsenstein, J. (2005). PHYLIP (Phylogeny Inference Package) version 3.6. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle.
Katoh, K., Misawa, K., Kuma, K. & Miyata, T. (2002). MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30, 3059–3066.
Loparev, V. N., Rubtcova, E. N., Bostik, V., Govil, D., Birch, C. J., Druce, J. D., Schmid, D. S. & Croxson, M. C. (2007). Identification of five major and two minor genotypes of varicella-zoster virus strains: a practical two-amplicon approach used to genotype clinical isolates in Australia and New Zealand. J Virol 81, 12758–12765.
McGeoch, D. J., Dolan, A., Donald, S. & Brauer, D. T. K. (1986). Complete DNA sequence of the short repeat region in the genome of herpes simplex virus type 1. Nucleic Acids Res 14, 1727–1745.
McGeoch, D. J., Davison, A. J., Dolan, A., Gatherer, D. & Sevilla-Reyes, E. E. (2008). Molecular evolution of the Herpesvirales. In Origin and Evolution of Viruses, 2nd edn, pp. 447–475. Edited by E. Domingo, C. R. Parrish & J. J. Holland. London: Academic Press.
Norberg, P., Liljeqvist, J. Å., Bergström, T., Sammons, S., Schmid, D. S. & Loparev, V. N. (2006). Complete-genome phylogenetic approach to varicella-zoster virus evolution: genetic divergence and evidence for recombination. J Virol 80, 9569–9576.
Peters, G. A., Tyler, S. D., Grose, C., Severini, A., Gray, M. J., Upton, C. & Tipples, G. A. (2006). A full-genome phylogenetic analysis of varicella-zoster virus reveals a novel origin of replication-based genotyping scheme and evidence of recombination between major circulating clades. J Virol 80, 9850–9860.
Quinlivan, M. & Breuer, J. (2006). Molecular studies of varicella-zoster virus. Rev Med Virol 16, 225–250.[CrossRef][Medline]
Quinlivan, M., Hawrami, K., Barrett-Muir, W., Aaby, P., Arvin, A., Chow, V. T., John, T. J., Matondo, P., Peiris, M. & other authors (2002). The molecular epidemiology of varicella-zoster virus: evidence for geographical segregation. J Infect Dis 186, 888–894.[CrossRef][Medline]
Ronquist, F. & Huelsenbeck, J. P. (2003). MrBayes 3: Bayesian inference under mixed methods. Bioinformatics 19, 1572–1574.
Sauerbrei, A., Zell, R., Harder, M. & Wutzler, P. (2006). Genotyping of different varicella vaccine strains. J Clin Virol 37, 109–117.[CrossRef][Medline]
Tyler, S. D., Peter, G. A., Grose, C., Severini, A., Gray, M. J., Upton, U. & Tipples, G. A. (2007). Genomic cartography of varicella-zoster virus: a complete genome-based analysis of strain variability with implications for attenuation and phenotypic differences. Virology 359, 447–458.[CrossRef][Medline]
Received 30 September 2008;
accepted 18 December 2008.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| INT J SYST EVOL MICROBIOL | MICROBIOLOGY | J GEN VIROL |
| J MED MICROBIOL | ALL SGM JOURNALS | |