|
|
||||||||

1 Department of Food Safety Science, Institute of Food Research, Norwich Research Park, Colney Lane, Norwich NR4 7UA, UK
2 Evolutionary Biology Group, Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK
3 Computational Biology Group, John Innes Centre, Norwich Research Park, Colney Lane, Norwich NR4 7HA, UK
4 School of Computing Sciences, University of East Anglia, Norwich NR4 7TJ, UK
Correspondence
Ian N. Roberts
Ian.Roberts{at}bbsrc.ac.uk
| ABSTRACT |
|---|
|
|
|---|
Supplementary material is available in JGV Online.
Present address: School of Information Technologies and Sydney University Biological Informatics and Technology Centre (SUBIT), University of Sydney, Camperdown, NSW 2006, Australia. ![]()
| INTRODUCTION |
|---|
|
|
|---|
Complete genomic sequences of caliciviruses are available from the GenBank database (Benson et al., 2002
) for several mammalian hosts. These host species include cow, hare, rabbit, human, cat, sea lion and pig. Partial genome sequences are available for various other host species, but derive from varying parts of the genome and so cannot be aligned for use in phylogenetic studies. A much larger number of sequences from the polymerase and capsid protein genes are also available and have been used extensively for phylogenetic analysis (Berke et al., 1997
; Green et al., 2000
), resulting in the definition of a number of genogroups (Berke et al., 1997
). However, the co-phylogeny of these viruses with their hosts and their demographic growth have yet to be investigated in any detail. Demographic histories of viruses vary greatly (Yusmin et al., 2001
; Twiddy et al., 2003
). With the availability of dated sequence information and the introduction of coalescent methods (Kingman, 1982
; see Methods), it is now possible to estimate (i) nucleotide substitution rate, (ii) the date of the most recent common ancestor (MRCA) and (iii) the population size over time (demographic growth) for a dataset of interest (Jenkins et al., 2002
). Estimates of all three of these parameters are reliant on the existence of a molecular clock of evolution.
Zoonotic transfer of viruses is a constant threat, both from the food chain (Slifko et al., 2000
) and via direct transfer (Meslin et al., 2000
). Recent high-profile examples include a variety of RNA viruses, notably Influenza A virus (Hong Kong chicken flu H5N1) (Yuen et al., 1998
), Ebola and Marburg viruses (Schou & Hansen, 2000
) and severe acute respiratory syndrome coronavirus (Stavrinides & Guttman, 2004
). In this paper, we have extended analysis of the calicivirus genome and sequence evolution by relating it to host phylogeny. We have also examined the origin of the noroviruses and comment on the nature of cross-species and cross-population transfer in the history of food-borne virus evolution and with regard to the potential future relationship of these viruses with our own species.
| METHODS |
|---|
|
|
|---|
Co-phylogeny analysis.
Calicivirus genomic sequences for seven mammalian hosts were obtained from GenBank. The sequences were aligned using the CLUSTAL_X software (Thompson et al., 1997
) and a phylogenetic tree was obtained using the DNAMLK option of the PHYLIP software package (Felsenstein, 1989
; using the model as described by Felsenstein & Churchill, 1996
). Although the molecular clock model, which is required by the co-phylogeny software we will introduce below, is not the optimal model for this dataset, the resulting topology of the phylogeny is broadly similar to that obtained by the no-clock model. The basic topology of the host tree was obtained from the Tree of Life Web Project (Maddison et al., 2001
). The dates of the splits in the tree were obtained from literature sources (Kumar & Hedges, 1998
; Bininda-Emonds et al., 1999
; Lui et al., 2001
; Murphy et al., 2001
).
The TreeMap software (http://www.it.usyd.edu.au/
mcharles) for hostparasite co-evolution (Charleston & Page, 1998
) was used to estimate the co-phylogeny of the virus and host trees. All potential optimal solutions were examined and the probability of each co-phylogeny having arisen by chance was calculated using the TreeMap randomization test. We also tested the significance of the number of co-divergent events (CEs) and non-co-divergent events (NCEs) observed in the optimal solutions. One thousand sets of random parasite trees were generated separately for each test. For the first set, the maximum number of CEs and for the second, the minimum number of NCEs was calculated when analysed with our fixed host tree. The observed values of CEs and NCEs from the optimal solutions were then compared with the corresponding distribution obtained from the randomly generated trees. These tests gave a further indication of whether our optimal trees could have arisen by chance alone.
The analysis outlined above was carried out on complete calicivirus genome sequences. In addition, for each of our species, we took a subset of the genomic sequence, the capsid sequence, and analysed them using a similar strategy. Finally, we compared the two sets of results. We did this in order to compare our whole-genome results with those of rapidly evolving sequences, possibly under high selective pressure, where the evolutionary picture might be different.
Tree dating analysis.
Viral strains associated with an outbreak will sometimes contain the date of the outbreak in the annotation of their sequences. This extra information may be used in the phylogenetic analysis process. A phylogenetic tree containing sequences with known isolation dates (dated tips) has two scales: the timescale of the tree measured in years and the changes in branch lengths measured in number of substitutions per site. If we assume a single rate of evolution across the tree, we are left with a linear relationship between time and substitution rate. Consequently, we can use the time between dated tips to calibrate the clock of the entire tree and hence estimate the MRCA and substitution rate.
The TipDate program (Rambaut, 2000
) was used to carry out this process. We used TipDate to try to estimate substitution rates and the age of the calicivirus and norovirus MRCAs using dated capsid sequences. Several norovirus capsid sequences, isolated from humans, are associated with known isolation dates. These capsid sequences were used to examine the age of the norovirus and calicivirus trees. Two datasets were assembled.
Dataset 1 consisted of 30 norovirus capsid sequences from dated outbreaks. Although more sequences were available, we restricted the size of the dataset to make analyses manageable. However, in order to ensure that our estimates of the substitution rate and MRCA were representative of the noroviruses as a whole, we chose the subset of sequences that represented the broadest available range of dated viruses.
Dataset 2 comprised 21 dated norovirus capsid sequences, which were known to form a closely related clade within a larger dataset, plus capsid sequences from each of the complete calicivirus genomes used in the earlier co-phylogeny mapping analysis. It was necessary to form this additional dataset in order to examine both noroviruses as an isolated genus (dataset 1) and the caliciviruses as a whole family (this dataset). For those complete genomes where no isolation date was given, the submission year to GenBank was used in its place. Although potential differences between the submission date and the date of outbreak may bias our results, the use of submission dates would be expected to provide a reasonable indication of whether the caliciviruses as a whole have an ancient or a recent origin.
For each of our two datasets, the sequences were aligned using CLUSTAL_X (Thompson et al., 1997
). Maximum-likelihood trees were estimated using the DNAML option in PHYLIP (Felsenstein, 1989
; using the model as described by Felsenstein & Churchill, 1996
) (see Supplementary material available in JGV Online). The capsid sequences in dataset 1 were rooted using the outgroup Vesicular exanthema of swine virus (VESV), whilst those in dataset 2 were rooted using Hepatitis E virus. VESV was chosen for dataset 1 as it occurs in the same family as the noroviruses, but in a different genus. Hepatitis E virus was chosen as the outgroup for dataset 2 as it is a closely related virus, but is classified outside the family Caliciviridae. Hence, both outgroups were similar enough to align with their respective datasets, but divergent enough to be grouped, taxonomically, outside all of the clades found in their respective datasets.
Using TipDate and the phylogenetic trees obtained above as input, three models single rate (SR), single rate dated tips (SRDT) and different rate (DR) were tested. SR, the simplest model, is the single rate (molecular clock) model where the sequences are considered to be contemporaneous. SRDT is the single rate model that takes into account non-contemporaneous sequences. The DR model, the most complex, permits each branch of the tree to have a different rate of substitution. The outputs of the models differ in that the SRDT model estimates the date of the MRCA, the substitution rate and the likelihood of the tree under the model, whilst SR and DR output the likelihood of the tree only. Additionally, for the SRDT model, 95 % confidence intervals (CI) for the substitution rate and MRCA were found by finding the parameter estimates that gave a log likelihood score of 1·92 less than the maximum value (Jenkins et al., 2002
). Each model used the HKY85 model of nucleotide substitution (Hasegawa et al., 1985
), as this contains an intermediate number of parameters, ensuring that the analysis would not be significantly under- or overparameterized. Both homogeneous (SRDT Homo) and heterogeneous (SRDT Hetero) rate variation between sites were examined for all three models, where the latter was assumed to have a gamma distribution with four rate categories and an alpha shape parameter of 0·5 (as used by Chare et al., 2003
). The maximum log likelihood (ln L) of the tree under each model was noted and a likelihood ratio test (LRT) was carried out between SR and DR and between SRDT and DR for both the homogeneous and the heterogeneous rate cases.
| RESULTS |
|---|
|
|
|---|
|
|
Tree dating analysis
In the analysis of dataset 1, 30 noroviruses were used to estimate a substitution rate and the age of the MRCA. Of the two SRDT models (see Methods), SRDT Hetero gave a significantly better fit to the data than SRDT Homo, with estimates of 1678 years for the MRCA and 0·002707 for the number of substitutions per site per year (Table 1a
). However, although the SRDT Hetero model provided a significantly better fit to the data than the SR Hetero model, the DR Hetero model provided the best overall fit. This is reflected in a lower confidence limit of 0 for the substitution rate and the corresponding upper limit of
for the MRCA in the SRDT Hetero model. In the analysis of dataset 2, the capsid sequences of the calicivirus genomes used in the co-phylogeny analysis were added to a known clade of 21 noroviruses. Again, of the two SRDT models, SRDT Hetero provided the better fit to the data, with estimates of 5127 years for the MRCA and 0·002688 for the number of substitutions per site per year (Table 1b). However, as for dataset 1, the DR Hetero model provided the best overall fit.
|
If we assume that the MRCA estimated by the SRDT Hetero model is approximately correct at 5000 years and the sequence data used in this analysis span a series of outbreaks covering a 14 year period, we are looking at perhaps only 0·3 % of the length of the tree. Ideally, a much wider range of sequence isolation dates would have been preferred, especially from closely related strains. Although they are not available at present, it may be possible to obtain these sequences and to carry out a more thorough analysis in future years. In addition, it would be interesting to examine a number of dated complete-genome sequences, rather than capsid sequences. The norovirus capsid sequences are likely to have evolved faster than non-structural proteins, thus influencing the results obtained. In conclusion, although it is possible that the norovirus and calicivirus sequences analysed have arisen within the last 5000 years, we have not been able to provide conclusive evidence for this age and hence are unable to estimate dates of the host switches discovered in the previous analysis.
| DISCUSSION |
|---|
|
|
|---|
In the absence of a valid substitution rate, it is also difficult to draw conclusions concerning the demographic growth of viruses compared with their hosts and thus to make predictions concerning food safety. In future years, it will become possible to obtain dated sequences from caliciviruses that cover a greater time period, providing better information on the substitution rate and MRCA. As this occurs, it will be interesting to examine the demographic growth of the human-related noroviruses to examine what factors influence the spread of the viruses, e.g. did the growth of the viruses increase in line with human population growth or perhaps emerge with the first large-scale movements of the human race around the world? The type of demographic growth observed, e.g. exponential, logistic or piece-wise logistic (Pybus & Rambaut, 2002
), will provide valuable information for food-safety strategies.
Throughout our analysis, we could find no evidence of zoonotic behaviour in caliciviruses. In all of the evolutionary co-phylogeny reconstructions examined, none showed any suggestion of non-human-to-human host switches. Of course, we cannot rule out the possibility that the inclusion of more virus sequences from more closely related hosts may indicate the potential for transfer amongst, for example, primate lineages. However, it is unlikely that the more distant relatives investigated in the present study pose a significant threat to humans.
In contrast, this might not be the case for some of the other hosts in our analysis. In the co-phylogeny (Fig. 1
), a virus representing an early ancestor of SMSLV jumped from sea lion to pig, evolving into VESV. It has been established for some time that SMSLV is very similar to VESV in many respects and that the respective viral particles are morphologically indistinguishable (Clarke & Lambden, 1997
). Furthermore, experimental research (Berry et al., 1990
) has shown that pigs are readily infected by SMSLV, resulting in a transmissible vesicular disease. Indeed, it has been proposed that recent epidemics of VESV in pigs may have resulted from the feeding of marine mammal (seal) meat and fish to pigs as a protein supplement during America's Great Depression (19291941) (Smith et al., 1998
). Co-phylogeny results are in agreement with the conclusion of these authors that there must be continuing concern that a VESV-like disease could reappear in the USA due to the large number of marine mammals on the west coast potentially acting as a marine reservoir for caliciviruses (Smith et al., 1998
).
In summary, identifying the date for the MRCA of emergent norovirus strains promises to be useful for understanding the circumstances surrounding their origin and the rate at which their populations may be expected to expand and diverge in the future. This will become increasingly feasible as more dated genomic sequences are added to the databases. It will also be interesting to determine whether other enteric viruses follow similar patterns of evolution and demographic growth.
| ACKNOWLEDGEMENTS |
|---|
| REFERENCES |
|---|
|
|
|---|
Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A. & Wheeler, D. L. (2002). GenBank. Nucleic Acids Res 30, 1720.
Berke, T., Golding, B., Jiang, X., Cubitt, D. W., Wolfaardt, M., Smith, A. W. & Matson, D. O. (1997). Phylogenetic analysis of the Caliciviruses. J Med Virol 52, 419424.[CrossRef][Medline]
Berry, E. S., Skilling, D. E., Barlough, J. E., Vedros, N. A., Gage, L. J. & Smith, A. W. (1990). New marine calicivirus serotype infective for swine. Am J Vet Res 51, 11841187.[Medline]
Bininda-Emonds, O. R. P., Gittleman, J. L. & Purvis, A. (1999). Building large trees by combining phylogenetic information: a complete phylogeny of the extant Carnivora (Mammalia). Biol Rev Camb Philos Soc 74, 143175.[Medline]
Centers for Disease Control & Prevention (2003). Norovirus activity United States, 2002. MMWR Morb Mortal Wkly Rep 52, 4145.[Medline]
Chare, E. R., Gould, E. A. & Holmes, E. C. (2003). Phylogenetic analysis reveals a low rate of homologous recombination in negative-sense RNA viruses. J Gen Virol 84, 26912703.
Charleston, M. A. (1998). Jungles: a new solution to the host/parasite phylogeny reconciliation problem. Math Biosci 149, 191223.[CrossRef][Medline]
Charleston, M. A. & Page, R. D. M. (1998). TreeMap 2.0
. Macintosh program for co-phylogenetic analysis, 2.0
edn. Available at http://evolve.zoo.ox.ac.uk/software.html?id=treemap
Charleston, M. A. & Robertson, D. L. (2002). Preferential host switching by primate lentiviruses can account for phylogenetic similarity with the primate phylogeny. Syst Biol 51, 528535.[CrossRef][Medline]
Clarke, I. N. & Lambden, P. R. (1997). The molecular biology of caliciviruses. J Gen Virol 78, 291301.[Medline]
Drake, J. W. & Holland, J. J. (1999). Mutation rates among RNA viruses. Proc Natl Acad Sci U S A 96, 1391013913.
Felsenstein, J. (1989). PHYLIP Phylogeny Inference Package (Version 3.2). Cladistics 5, 164166.
Felsenstein, J. & Churchill, G. A. (1996). A hidden Markov model approach to variation among sites in rate of evolution. Mol Biol Evol 13, 93104.[Abstract]
Food Standards Agency (2000). A Report of the Study of Infectious Intestinal Disease in England. London: The Stationery Office.
Green, J., Vinje, J., Gallimore, C. I., Koopmans, M., Hale, A., Brown, D. W., Clegg, J. C. & Chamberlain, J. (2000). Capsid protein diversity among Norwalk-like viruses. Virus Genes 20, 227236.[CrossRef][Medline]
Hardy, M. E. & Estes, M. K. (1996). Completion of the Norwalk virus genome sequence. Virus Genes 12, 287290.[Medline]
Hasegawa, M., Kishino, H. & Yano, T. (1985). Dating of the humanape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22, 160174.[CrossRef][Medline]
Holmes, E. C. (2003). Molecular clocks and the puzzle of RNA virus origins. J Virol 77, 38933897.
Jenkins, M. J., Rambaut, A., Pybus, O. G. & Holmes, E. C. (2002). Rates of molecular evolution in RNA viruses: a quantitative phylogenetic analysis. J Mol Evol 54, 156165.[CrossRef][Medline]
Kingman, J. F. C. (1982). On the genealogy of large populations. J Appl Probab 19A, 2743.
Koopmans, M. & Duizer, E. (2004). Foodborne viruses: an emerging problem. Int J Food Microbiol 90, 2341.[CrossRef][Medline]
Kumar, S. & Hedges, S. B. (1998). A molecular timescale for vertebrate evolution. Nature 392, 917920.[CrossRef][Medline]
Lui, F.-G. R., Miyamoto, M. M., Freire, N. P., Ong, P. Q., Tennant, M. R., Young, T. S. & Gugel, K. F. (2001). Molecular and morphological supertrees for eutherian (placental) mammals. Science 291, 17861789.
Maddison, D. R., Maddison, W. P., Schulz, K. S., Wheeler, T. & Frumkin, J. (2001). Tree of life web project. Available at: http://tolweb.org.
Meslin, F. X., Stohr, K. & Heymann, D. (2000). Public health implications of emerging zoonoses. Rev Sci Tech 19, 310317.[Medline]
Murphy, W. J., Elzirik, E., Johnson, W. E., Zhang, Y. P., Ryder, O. A. & O'Brien, S. J. (2001). Molecular phylogenetics and the origins of placental mammals. Nature 409, 614618.[CrossRef][Medline]
Pybus, O. G. & Rambaut, A. (2002). GENIE: estimating demographic history from molecular phylogenies. Bioinformatics 18, 14041405.
Rambaut, A. (2000). Estimating the rate of molecular evolution: incorporating non-contemporaneous sequences into maximum likelihood estimates. Bioinformatics 16, 395399.
Schou, S. & Hansen, A. K. (2000). Marburg and Ebola virus infections in laboratory non-human primates: a literature review. Comp Med 50, 108123.[Medline]
Slifko, T. R., Smith, H. V. & Rose, J. B. (2000). Emerging parasite zoonoses associated with water and food. Int J Parasitol 30, 13791393.[CrossRef][Medline]
Smith, A. W., Skilling, D. E., Cherry, N., Mead, J. H. & Matson, D. O. (1998). Calicivirus emergence from ocean reservoirs: zoonotic and interspecies movements. Emerg Infect Dis 4, 1320.[Medline]
Stavrinides, J. & Guttman, D. S. (2004). Mosaic evolution of the severe acute respiratory syndrome coronavirus. J Virol 78, 7682.
Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. & Higgins, D. G. (1997). The Clustal_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25, 48764882.
Twiddy, S. S., Pybus, O. G. & Holmes, E. C. (2003). Comparative population dynamics of mosquito-borne flaviviruses. Infect Genet Evol 3, 8795.[CrossRef][Medline]
van Regenmortel, M. H. V., Fauquet, C. M., Bishop, D. H. L. & 8 other editors (2000). Virus Taxonomy: Classification and Nomenclature of Viruses: Seventh Report of the International Committee on Taxonomy of Viruses. San Diego: Academic Press.
Yuen, K. Y., Chan, P. K. S., Peiris, M. & 8 other authors (1998). Clinical features and rapid viral diagnosis of human disease associated with avian influenza A H5N1 virus. Lancet 351, 467471.[CrossRef][Medline]
Yusmin, K., Peeters, M., Pybus, O. G., Bhattacharya, T., Delaporte, E., Mulanga, C., Muldoon, M., Theiler, J. & Korber, B. (2001). Using human immunodeficiency virus type 1 sequences to infer historical features of the acquired immune deficiency syndrome epidemic and human immunodeficiency virus evolution. Philos Trans R Soc Lond B Biol Sci 356, 855866.
Received 26 October 2005;
accepted 22 December 2005.
This article has been cited by other articles:
![]() |
C. Tolf, M. Gullberg, E. S. Johansson, R. B. Tesh, B. Andersson, and A. M. Lindberg Molecular characterization of a novel Ljungan virus (Parechovirus; Picornaviridae) reveals a fourth genotype and indicates ancestral recombination J. Gen. Virol., April 1, 2009; 90(4): 843 - 853. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| INT J SYST EVOL MICROBIOL | MICROBIOLOGY | J GEN VIROL |
| J MED MICROBIOL | ALL SGM JOURNALS | |