|
|
||||||||
1 Department of Biology, Georgia State University, Atlanta, GA, USA
2 Department of Developmental Medical Sciences, Institute of International Health, Graduate School of Medicine, The University of Tokyo, Tokyo, Japan
Correspondence
Teryl K. Frey
tfrey{at}gsu.edu
| ABSTRACT |
|---|
|
|
|---|
Supplementary tables are available in JGV Online.
| INTRODUCTION |
|---|
|
|
|---|
Although rubella occurs worldwide, vaccination efforts with live-attenuated vaccines have been concentrated in developed countries. Currently, approximately 50 % of countries have national vaccination efforts against rubella (Robertson et al., 2003
). Isolation and genetic sequencing of rubella viruses has been most thorough in countries pursuing elimination (Bosma et al., 1996
; Frey et al., 1998
; Icenogle et al., 2006
; Katow, 2004
; Katow et al., 1997a
, b
; Reef et al., 2002
; Saitoh et al., 2006
); however, collections have recently been assembled from other regions of the world (Donadio et al., 2003
; Katow, 2004
; Zheng et al., 2003a
, c
). Recently, a standard taxonomy for rubella viruses was adopted based on sequences of a standard window within the E1 gene and supported by sequencing of the SP-ORF of selected viruses (WHO, 2005
). The taxonomy consists of two clades [corresponding to the previous genotypes I and II (Frey et al., 1998
; Zheng et al., 2003a
)] containing a total of ten genotypes, seven in clade 1 (1a, 1B, 1C, 1D, 1E, 1F and 1g) and three in clade 2 (2A, 2B and 2c); the genotypes designated in lower case are provisional. Within the E1 gene, maximal variation among clade 1 viruses is 5.8 %, that among clade 2 viruses is 8.0 %, and it is 8.2 % between the two clades (Zheng et al., 2003a
). Geographically, clade 1 viruses circulate worldwide, whilst clade 2 viruses thus far have been restricted to Eurasia.
Thus far, ten complete genomic sequences of Rubella virus have been reported, which represent only two of the ten genotypes (eight sequences of genotype 1a viruses and two sequences of genotype 2A viruses). Among these sequences, genomic genetic variability is similar to that in the E1 gene, with the exception of a hypervariable region (HVR) of greater variability in the middle of the P150 non-structural protein gene (Hofmann et al., 2003
; Zheng et al., 2003b
). Given the lack of representation of the majority of the genotypes in the current genomic database, the first goal of this study was to expand the number of genomic sequences, using viruses in our collection from six additional genotypes (1B, 1C, 1D, 1E, 2B and 2C). The second goal of this study was to extend phylogenetic analysis to 5' regions of the genome, which had not previously been done. To this end, the sequence of the non-structural protease-encoding region within the P150 gene was determined and compared phylogenetically with the sequences of the junction region and the E1 gene from 43 viruses representing eight genotypes.
| METHODS |
|---|
|
|
|---|
|
model) was used as a substitution model for phylogenetic reconstruction, as it was found statistically to be the best fit for our datasets. Maximum-likelihood (ML) phylogenetic analysis was performed by using the TREE-PUZZLE program version 5.2 (Strimmer & von Haeseler, 1996
substitution model with an initial neighbour-joining tree and then the best ML tree was reconstructed with these optimized parameters by using the quartet-puzzling method in the TREE-PUZZLE program. Fifty thousand and one hundred thousand quartet-puzzling steps were performed in constructing trees from the 19 genomic and 43 genomic region sequences, respectively. Sequence similarities and observed distances were calculated by using the Old Distance program in the GCG software package. A nucleotide sequence PLOTSIMILARITY plot across the genome (100 nt window) was generated by using the PLOTSIMILARITY program in the GCG software package. As the sequences of genotype 1a viruses were over-represented, the plot was generated by using six sequences from each clade, including members from each genotype. Nucleotide sequence substitution-rate analysis was carried out with PILEUP (GCG package), fastDNAml (version 1.2.2) and DNArates (version 1.1.0), employing default parameters. To detect recombination, phylogenetic analysis of sequences on either side of putative break points was conducted by using TREE-PUZZLE with the same parameter settings as were used in the genomic sequence analysis. Recombination was also analysed by using the sequence recombination-detection programs TOPALi (Milne et al., 2004| RESULTS |
|---|
|
|
|---|
|
|
|
Among the 19 genomic sequences, 78 % of the nucleotides were invariant. Not surprisingly, the parameter of rate heterogeneity,
, was 0.22 for the entire genome and varied between 0.19 and 0.33 across the genomic regions, with the exception of the HVR, within which
=1.35. These small
values indicated a strong substitution-rate heterogeneity among nucleotide sites across most of the genome (i.e. more than three-quarters of the nucleotides remained constant, whilst fewer than one-quarter exhibited variability). Within the HVR, 46 % of the nucleotides were variable.
Phylogenetic analysis
ML phylogenetic trees constructed from the complete genomic sequences, as well as from the NSP- and SP-ORFs, are displayed in Fig. 3
. As in the E1-based tree, the six clade 2 sequences formed a clear, consistent branching pattern with high support values in all three trees, indicating that genotypes 2A and 2B are related more closely to each other than to genotype 2c. In clade 1, the groupings of genotypes 1B, 1C, 1D and 1E on the three trees were consistent: the two genotype 1B sequences formed a branch, as did the individual genotype 1D and 1E sequences, whilst the individual genotype 1C sequence extended from the baseline with no relative relationship to other genotypes. On the genomic and SP-ORF trees, the eight genotype 1a sequences grouped into four pairs, indicated in Fig. 3
as a1 (TO-w and TO-v; a wild-type parent and the attenuated vaccine derived from it), a2 (Fth and RA27/3; both isolated in the north-eastern USA in 1964), a3 (CEN and M33; isolated from Europe and the USA in 19611962) and a4 (SUR and ULR; both isolated from eastern Europe in 1974 and 1984). Interestingly, on the NSP-ORF tree, M33 separated from CEN (a3) and clustered with the a4 grouping. We also constructed trees by using the sequences of the genes and regions within the NSP-ORF and SP-ORF (data not shown), with the result that they had the same general topology as the ORF-generated trees. The exception was the HVR-generated tree, on which each of the clade 1 viruses formed an individual branch, apart from the a1 and M33-SUR groupings, which were preserved.
|
|
| DISCUSSION |
|---|
|
|
|---|
A sequence-similarity profile revealed, with two exceptions, a comparable pattern across the genome with local windows of similarity and dissimilarity varying about a relatively uniform mean, indicating that most genomic regions, including both virion protein and replicase protein genes, were equally divergent. Both observed and genetic distances between these genomic regions were comparable. The two exceptions were both within the P150 gene, with the N-terminal MT domain exhibiting less variability and the internal HVR exhibiting greater variability. Although the MT domain was predicted to encode both methyl- and guanylyltransferase activities (Rozanov et al., 1992
; neither activity has been demonstrated experimentally), the fact that 90 % of the nucleotide residues within this region are conserved raises the possibility that this region serves as a CAE in addition to encoding protein sequence. Consistent with this possibility, the phenotype of a cell culture-potentiating mutation discovered at nt 164 of the RA27/3 genome was found to be due to the nucleotide itself rather than to the encoded amino acid (Pugachev et al., 2000
). Conservation of the MT domain sequence has also been observed in other alpha-like family viruses (Gouvea et al., 1998
). The HVR encodes a proline- and arginine-rich domain of P150 termed the proline hinge (Koonin et al., 1992
), although this domain contains several adaptor motifs that could serve to facilitate the association of P150 with other proteins. If this domain serves as a structural hinge between functional domains within the P150 protein, this could explain the lower constraint on sequence conservation within the HVR in comparison with the rest of the genome. On the other hand, Hofmann et al. (2003)
reported data suggesting that the HVR among clade 1 viruses was under positive selection at the amino acid level. It should be pointed out that hypervariable region is a relative term, in that HVRs in the genomes of other viruses are often more variable than the rubella virus HVR. For example, in the hepatitis E virus HVR, variability is >50 % (Arankalle et al., 1999
; Gouvea et al., 1998
; Nishizawa et al., 2003
; van Cuyck et al., 2003
).
With the exception of the HVR, nucleotide residues or sites across the genome showed a strong heterogeneity in rate of divergence, as indicated by the low value of the rate-heterogeneity parameter
. Sequence collections with low
values exhibit an L-shaped distribution on a graph of number of sites versus rate of divergence, rather than the bell-shaped curve generated when
is 1 (or >1). The low
value reflects the fact that roughly 80 % of the residues in the rubella virus genome were invariant in this collection of genomic sequences. The percentage of invariant residues at first and second codon positions was 93 %, compared with 48 % at third codon positions (Y. Zhou, unpublished data), and thus maintenance of amino acid sequence is a substantial component of the conservation of nucleotide sequence. Among third codon positions, the G+C content was 81 mol%, compared with 63 mol% among first and second codon positions (Y. Zhou, unpublished data), and thus there was selection for G and C residues. This selection was also evident in the HVR, within which the G+C content was 81 mol%, compared with 70 mol% for the genome.
Among the nucleotide substitutions at the 20 % of genomic sites that exhibited variability, transitions were strikingly more abundant than transversions; across the genome, the transition to transversion ratio, K, was 7.0 and varied among genomic regions from 4.5 to 13.4. Thus, the rubella virus genome exhibited the transition over transversion preference that has been well documented in DNA genomes (Meyer et al., 1999
; Salemi & Vandamme, 2003
). This preference has been attributed to the facts that transitions are more likely to lead to silent mutations in amino acid sequence and that, during replication, it is more likely that a mutation to a nucleotide of equal size (transition) will occur than to a nucleotide of different size (transversion). In RNA genomes, the possibility of both GC and GU pairing would also favour transitions in the replication process. Interestingly, pyrimidine transitions were favoured over purine transitions by a ratio (Y/R) of 2.7 across the entire genome; Y/R varied from 0.9 to 3.7 in genomic regions. Within the HVR, the most variable region of the genome, both K and Y/R were lower than for the entire genome and most of the other genomic regions, indicating that the variability in this region was generated in part by relaxing of the genomic preference for pyrimidine transitions over transversions and purine transitions.
Phylogenetic analysis of rubella viruses has traditionally been done on the basis of E1 gene or subE1 gene sequences and a standard taxonomy was proposed recently, based on a window within the E1 gene, that was substantiated by using complete SP-ORF sequences (WHO, 2005
). The second goal of this study was to extend phylogenetic analysis to the 5' region of the genome and we found that generally comparable trees, in terms of both overall variability and phylogenetic clustering, were generated with sequence windows in the NSP-ORF. The exception was a group of seven genotype 1B viruses that formed a branch in a tree based on NP sequence, but formed two branches on the basis of JR sequence. Intriguingly, a deletion in the junction region of five of these seven viruses did not segregate with the two phylogenetic branches on the JR tree. Analysis revealed a recombinational event, putatively near the 5' end of the C gene, that led to the generation of the two branches on the JR tree. There was one previous report of a natural recombination event in Rubella virus (in the E1 gene; Zheng et al., 2003a
), but the origin of the recombinant strain was in doubt because one of the parents was related closely to a commonly used laboratory strain. Thus, this was the first conclusive evidence of rubella virus recombination in nature.
Interestingly, the E1 sequences of the seven genotype 1B viruses did not cluster on the E1-based tree and, in comparison with the NP- and JR-based trees, this could be due to divergence or additional recombinational events. As can be seen in the tree in Fig. 1
, genotype 1B consists of two sub-branches that would not necessarily appear to be related if fewer sequences were employed (e.g. the E1-based tree in Fig. 4
). It is also to be noted that all of the WHO reference strains are on one of these sub-branches. Thus, for this genotype, phylogenetic analysis using sequences from the NSP-ORF region of the genome could be useful in assessing relatedness.
| ACKNOWLEDGEMENTS |
|---|
| REFERENCES |
|---|
|
|
|---|
Bosma, T. J., Best, J. M., Corbett, K. M., Banatvala, J. E. & Starkey, W. G. (1996). Nucleotide sequence analysis of a major antigenic domain of the E1 glycoprotein of 22 rubella virus isolates. J Gen Virol 77, 25232530.
Chantler, J. K., Wolinsky, J. S. & Tingle, A. (2001). Rubella virus. In Fields Virology, 4th edn, pp. 963990. Edited by D. M. Knipe & P. M. Howley. Philadelphia, PA: Lippincott Williams & Wilkins.
Clarke, D. M., Loo, T. W., Hui, I., Chong, P. & Gillam, S. (1987). Nucleotide sequence and in vitro expression of rubella virus 24S subgenomic messenger RNA encoding the structural proteins E1, E2 and C. Nucleic Acids Res 15, 30413057.
Dominguez, G., Wang, C. Y. & Frey, T. K. (1990). Sequence of the genome RNA of rubella virus: evidence for genetic rearrangement during togavirus evolution. Virology 177, 225238.[CrossRef][Medline]
Donadio, F. F., Siqueira, M. M., Vyse, A., Jin, L. & Oliveira, S. A. (2003). The genomic analysis of rubella virus detected from outbreak and sporadic cases in Rio de Janeiro state, Brazil. J Clin Virol 27, 205209.[CrossRef][Medline]
Frey, T. K. (1994). Molecular biology of rubella virus. Adv Virus Res 44, 69160.[Medline]
Frey, T. K., Abernathy, E. S., Bosma, T. J., Starkey, W. G., Corbett, K. M., Best, J. M., Katow, S. & Weaver, S. C. (1998). Molecular analysis of rubella virus epidemiology across three continents, North America, Europe, and Asia, 1961-1997. J Infect Dis 178, 642650.[Medline]
Gouvea, V., Snellings, N., Popek, M. J., Longer, C. F. & Innis, B. L. (1998). Hepatitis E virus: complete genome sequence and phylogenetic analysis of a Nepali isolate. Virus Res 57, 2126.[CrossRef][Medline]
Henikoff, S. & Henikoff, J. G. (1994). Position-based sequence weights. J Mol Biol 243, 574578.[CrossRef][Medline]
Hofmann, J., Renz, M., Meyer, S., von Haeseler, A. & Liebert, U. G. (2003). Phylogenetic analysis of rubella virus including new genotype I isolates. Virus Res 96, 123128.[CrossRef][Medline]
Huang, F. F., Sun, Z. F., Emerson, S. U., Purcell, R. H., Shivaprasad, H. L., Pierson, F. W., Toth, T. E. & Meng, X. J. (2004). Determination and analysis of the complete genomic sequence of avian hepatitis E virus (avian HEV) and attempts to infect rhesus monkeys with avian HEV. J Gen Virol 85, 16091618.
Icenogle, J. P., Frey, T. K., Abernathy, E., Reef, S. E., Schnurr, D. & Stewart, J. A. (2006). Genetic analysis of rubella viruses found in the United States between 1966 and 2004: evidence that indigenous rubella viruses have been eliminated. Clin Infect Dis 43 (Suppl. 3), S133S140.
Kakizawa, J., Nitta, Y., Yamashita, T., Ushijima, H. & Katow, S. (2001). Mutations of rubella virus vaccine TO-336 strain occurred in the attenuation process of wild progenitor virus. Vaccine 19, 27932802.[CrossRef][Medline]
Kang, S. Y., Yun, S. I., Park, H. S., Park, C. K., Choi, H. S. & Lee, Y. M. (2004). Molecular characterization of PL97-1, the first Korean isolate of the porcine reproductive and respiratory syndrome virus. Virus Res 104, 165179.[CrossRef][Medline]
Katow, S. (2004). Molecular epidemiology of rubella virus in Asia: utility for reduction in the burden of diseases due to congenital rubella syndrome. Pediatr Int 46, 207213.[CrossRef][Medline]
Katow, S., Minahara, H., Fukushima, M. & Yamaguchi, Y. (1997a). Molecular epidemiology of rubella by nucleotide sequences of the rubella virus E1 gene in three East Asian countries. J Infect Dis 176, 602616.[Medline]
Katow, S., Minahara, H., Ota, T. & Fukushima, M. (1997b). Identification of strain-specific nucleotide sequences in E1 and NS4 genes of rubella virus vaccine strains in Japan. Vaccine 15, 15791585.[CrossRef][Medline]
Kinney, R. M., Pfeffer, M., Tsuchiya, K. R., Chang, G. J. & Roehrig, J. T. (1998). Nucleotide sequences of the 26S mRNAs of the viruses defining the Venezuelan equine encephalitis antigenic complex. Am J Trop Med Hyg 59, 952964.[Abstract]
Koonin, E. V., Gorbalenya, A. E., Purdy, M. A., Rozanov, M. N., Reyes, G. R. & Bradley, D. W. (1992). Computer-assisted assignment of functional domains in the nonstructural polyprotein of hepatitis E virus: delineation of an additional group of positive-strand RNA plant and animal viruses. Proc Natl Acad Sci U S A 89, 82598263.
Lund, K. D. & Chantler, J. K. (2000). Mapping of genetic determinants of rubella virus associated with growth in joint tissue. J Virol 74, 796804.
Meyer, S., Weiss, G. & von Haeseler, A. (1999). Pattern of nucleotide substitution and rate heterogeneity in the hypervariable regions I and II of human mtDNA. Genetics 152, 11031110.
Milne, I., Wright, F., Rowe, G., Marshall, D. F., Husmeier, D. & McGuire, G. (2004). TOPALi: software for automatic identification of recombinant sequences within DNA multiple alignments. Bioinformatics 20, 18061807.
Nishizawa, T., Takahashi, M., Mizuo, H., Miyajima, H., Gotanda, Y. & Okamoto, H. (2003). Characterization of Japanese swine and human hepatitis E virus isolates of genotype IV with 99 % identity over the entire genome. J Gen Virol 84, 12451251.
Pugachev, K. V., Abernathy, E. S. & Frey, T. K. (1997). Genomic sequence of the RA27/3 vaccine strain of rubella virus. Arch Virol 142, 11651180.[CrossRef][Medline]
Pugachev, K. V., Galinski, M. S. & Frey, T. K. (2000). Infectious cDNA clone of the RA27/3 vaccine strain of Rubella virus. Virology 273, 189197.[CrossRef][Medline]
Reef, S. E., Frey, T. K., Theall, K., Abernathy, E., Burnett, C. L., Icenogle, J., McCauley, M. M. & Wharton, M. (2002). The changing epidemiology of rubella in the 1990s: on the verge of elimination and new challenges for control and prevention. JAMA 287, 464472.
Robertson, S. E., Featherstone, D. A., Gacic-Dobo, M. & Hersh, B. S. (2003). Rubella and congenital rubella syndrome: global update. Rev Panam Salud Publica 14, 306315.[Medline]
Rozanov, M. N., Koonin, E. V. & Gorbalenya, A. E. (1992). Conservation of the putative methyltransferase domain: a hallmark of the Sindbis-like supergroup of positive-strand RNA viruses. J Gen Virol 73, 21292134.
Saitoh, M., Shinkawa, N., Shimada, S., Segawa, Y., Sadamasu, K., Hasegawa, M., Kato, M., Kozawa, K., Kuramoto, T. & other authors (2006). Phylogenetic analysis of envelope glycoprotein (E1) gene of rubella viruses prevalent in Japan in 2004. Microbiol Immunol 50, 179185.[Medline]
Saleh, S. M., Poidinger, M., Mackenzie, J. S., Broom, A. K., Lindsay, M. D. & Hall, R. A. (2003). Complete genomic sequence of the Australian south-west genotype of Sindbis virus: comparisons with other Sindbis strains and identification of a unique deletion in the 3'-untranslated region. Virus Genes 26, 317327.[CrossRef][Medline]
Salemi, M. & Vandamme, A.-M. (2003). The Phylogenetic Handbook: a Practical Approach to DNA and Protein Phylogeny. Cambridge: Cambridge University Press.
Strimmer, K. & von Haeseler, A. (1996). Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol Biol Evol 13, 964969.
Takahashi, K., Kang, J. H., Ohnishi, S., Hino, K., Miyakawa, H., Miyakawa, Y., Maekubo, H. & Mishiro, S. (2003). Full-length sequences of six hepatitis E virus isolates of genotypes III and IV from patients with sporadic acute or fulminant hepatitis in Japan. Intervirology 46, 308318.[CrossRef][Medline]
Tamura, K. & Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10, 512526.[Abstract]
Tarbatt, C. J., Glasgow, G. M., Mooney, D. A., Sheahan, B. J. & Atkins, G. J. (1997). Sequence analysis of the avirulent, demyelinating A7 strain of Semliki Forest virus. J Gen Virol 78, 15511557.[Abstract]
van Cuyck, H., Juge, F. & Roques, P. (2003). Phylogenetic analysis of the first complete hepatitis E virus (HEV) genome from Africa. FEMS Immunol Med Microbiol 39, 133139.[CrossRef][Medline]
WHO (2005). Standardization of the nomenclature for genetic characteristics of wild-type rubella viruses. Wkly Epidemiol Rec 80, 126132.[Medline]
Xia, X. (2000). Data Analysis in Molecular Biology and Evolution. Boston: Kluwer Academic Publishers.
Xia, X. & Xie, Z. (2001). DAMBE: software package for data analysis in molecular biology and evolution. J Hered 92, 371373.
Yang, Z. (1994). Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J Mol Evol 39, 306314.[CrossRef][Medline]
Yang, D. K., Kim, B. H., Kweon, C. H., Kwon, J. H., Lim, S. I. & Han, H. R. (2004). Molecular characterization of full-length genome of Japanese encephalitis virus (KV1899) isolated from pigs in Korea. J Vet Sci 5, 197205.[Medline]
Zheng, D. P., Frey, T. K., Icenogle, J., Katow, S., Abernathy, E. S., Song, K. J., Xu, W. B., Yarulin, V., Desjatskova, R. G. & other authors (2003a). Global distribution of rubella virus genotypes. Emerg Infect Dis 9, 15231530.[Medline]
Zheng, D. P., Zhou, Y. M., Zhao, K., Han, Y. R. & Frey, T. K. (2003b). Characterization of genotype II rubella virus strains. Arch Virol 148, 18351850.[CrossRef][Medline]
Zheng, D. P., Zhu, H., Revello, M. G., Gerna, G. & Frey, T. K. (2003c). Phylogenetic analysis of rubella virus isolated during a period of epidemic transmission in Italy, 1991-1997. J Infect Dis 187, 15871597.[CrossRef][Medline]
Received 24 August 2006;
accepted 20 November 2006.
This article has been cited by other articles:
![]() |
J. Jorba, R. Campagnoli, L. De, and O. Kew Calibration of Multiple Poliovirus Molecular Clocks Covering an Extended Evolutionary Range J. Virol., May 1, 2008; 82(9): 4429 - 4440. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhou, W.-P. Tzeng, W. Yang, Y. Zhou, Y. Ye, H.-w. Lee, T. K. Frey, and J. Yang Identification of a Ca2+-Binding Domain in the Rubella Virus Nonstructural Protease J. Virol., July 15, 2007; 81(14): 7517 - 7528. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| INT J SYST EVOL MICROBIOL | MICROBIOLOGY | J GEN VIROL |
| J MED MICROBIOL | ALL SGM JOURNALS | |