|
|
||||||||
Short Communication |

Division of Viral Hepatitis, MS-A33, Centers for Disease Control and Prevention, 1600 Clifton Road NE, Atlanta, GA 30329-4018, USA
Correspondence
Michael A. Purdy
mup3{at}cdc.gov
| ABSTRACT |
|---|
|
|
|---|
Present address: Jet Propulsion Laboratory, 4800 Oak Grove Drive, Pasadena, CA 91109, USA. ![]()
High-resolution versions of the figures shown are available with the online version of this paper.
| MAIN TEXT |
|---|
|
|
|---|
HBV genomes are segregated into eight genotypes based on sequence divergences (Norder et al., 1992
; Bowyer & Sim, 2000
; Stuyver et al., 2000
; Arauz-Ruiz et al., 2002
; Akuta & Kumada, 2005
). Genotypes are defined as sequences within a phylogenetic clade having less than 4 % sequence divergence among the members of the clade and more that 8 % sequence divergence with extra-clade sequences (Okamoto et al., 1988
; Norder et al., 1994
; Bowyer & Sim, 2000
). Divergences as high as 13.6 and 15.5 % have been noted for genotypes E and F with respect to genotypes A–D (Norder et al., 1994
). Phylogenetic analysis of different regions and genes within the HBV genome has shown that, although the branching order of genotypes differs from the whole genome, the sequences belonging to specific genotypes consistently cluster together (Norder et al., 1994
; Kato et al., 2002
).
Recombination within and between genotypes has created complex patterns of evolution (Bollyky & Holmes, 1999
; Bowyer & Sim, 2000
). Recombination can alter the cladistic structure of genotypes. Within the core region alone, recombination between genotypes B and C is considered to lead to the formation of the Ba subgenotype (Sugauchi et al., 2002
) and the apparent merging of genotypes D and E (Bowyer & Sim, 2000
). A recent report on the potential recombinant nature of genotype G adds another example of a genotype, in addition to genotype E, all member strains of which originate from one ancestral intergenotype recombinant strain (Simmonds & Midgley, 2005
). These observations emphasize a need for systematic investigation of genealogical relationships among genotypes along the HBV genome. In the present study, we conducted such an investigation of HBV genotypes using different regions of the HBV genome based on information available from a large database of sequences.
A set of 311 full-length human HBV sequences was obtained from GenBank. Non-human primate sequences and sequences with deletions were excluded. When multiple sequences came from a single patient, only one sequence was chosen. HBV strains with intergenotypic recombination have been found by many researchers (Bollyky & Holmes, 1999
; Bowyer & Sim, 2000
; Fares & Holmes, 2002
; Simmonds & Midgley, 2005
). SimPlot was used to remove individual sequences with evidence of intergenotypic recombination (Lole et al., 1999
). Identification of recombination by SimPlot requires the use of reference sequences representing each HBV genotype. These sequences were selected by summing the square of the genetic distances between a given sequence and all other sequences in the database belonging to the same genotype, and selecting the sequence with the lowest sum-squared distance for each genotype. This approach to selecting sequences ensured that the identified benchmark sequences were representative of an entire genotype and, as such, were devoid of any elements that were uncharacteristic for the selected genotype. The use of actual instead of consensus sequences guaranteed against any potential artefactual structural features being introduced by generating the consensus. The GenBank accession numbers for the sequences selected were: AY128092
[GenBank]
(genotype A), D23679
[GenBank]
(genotype B), AF458664
[GenBank]
(genotype C), AF121240
[GenBank]
(genotype D), AB032431
[GenBank]
(genotype E), AB036905
[GenBank]
(genotype F), AF160501
[GenBank]
(genotype G), AY090457
[GenBank]
(genotype H), and AF046996
[GenBank]
(woolly monkey HBV). These sequences were tested with SimPlot and RDP (Martin & Rybicki, 2000
) to ensure they were free from recombination. No attempts were made to remove genotype- and subgenotype-wide recombinants such as genotype E (Bowyer & Sim, 2000
) and subgenotype Ba sequences (Sugauchi et al., 2002
). Only single intergenotypic recombinant sequences were removed from further analysis.
Sequences from the resulting dataset were aligned using CLUSTAL_X (Thompson et al., 1997
), and all positions with gaps in the aligned sequences and the 36 nt insert in the core gene of genotype G sequences (Stuyver et al., 2000
) were removed. No attempt was made to ensure that codon integrity was maintained. Woolly monkey HBV (GenBank accession no. AF046996
[GenBank]
) was used as an outgroup. The final dataset contained 241 human HBV sequences.
Originally, the HBV sequence dataset was examined using a sliding window of 700 bases with a 50-base step to create phylogenetic trees for each window (data not shown) to determine phylogenetic relationships between different regions of the genome in a systematic manner. However, evaluation of variations in topologies of a large set of trees is difficult. To facilitate analysis of changes of intergenotypic distances among HBV genotypes within different regions across the genome, a graphical method was devised.
This method is based on plots that depict the distances from two different genotypic sequences, called query sequences Q1 and Q2, to a third sequence, called the reference sequence R. A sliding window of 700 bases with a 50-base step was used to examine the set of nine benchmark sequences described above. The 700-base window was chosen to minimize multifurcations and intergenotypic mixing of sequences (Nei et al., 1998
). A distance matrix was generated for each window using TREE-PUZZLE (Strimmer & von Haeseler, 1996
). Graphs of the matrix distances were created by plotting the distances for Q1 and Q2 with respect to R. The distance values for each window for a specific pair of Q1 and Q2 to a specified R were plotted as d(R,Q2) versus d(R,Q1) (Fig. 1
). The plot shows the change in this value within each sliding window across the HBV genome. It was assumed that, if rates of mutation are constant between Q1 and Q2 with respect to R across the genome, a straight-line plot would be expected. As the rate of mutation across the HBV genome is not constant (Bollyky & Holmes, 1999
; Hannoun et al., 2000
; Fares & Holmes, 2002
), there will be deviations from a straight line. Large deviations between sequences may represent either large deviations in the rate of mutation between the query sequences or recombination events. The distance matrix plot alone cannot make a distinction between these two explanations. Additionally, if R is phylogenetically at an equal distance from both Q1 and Q2, the distance matrix plot for all sliding windows along the HBV genome should be spread along the straight line at 4 °. Since each sequence for this analysis was specifically selected to represent the entire genotype, it was expected that all plots would follow this pattern.
|
To examine this deviation from expected behaviour in more detail, the frequency distribution of phylogenetic distances for the 242 sequence dataset (241 human sequences plus the woolly monkey sequence) was determined. To do this, DNADIST with the Kimura two-parameter model was used to generate distance matrices (Felsenstein, 1989
). Frequency distributions were generated by segregating these phylogenetic distances into 80 equal-sized distance categories and plotting the frequency of occurrence for all distances per distance bin using the frequency function in Microsoft Excel. If HBV was stratified only as isolates (subgenotypes) and genotypes, a two-peak frequency distribution should become evident (excluding the outgroup peak) when the frequency of phylogenetic distances was plotted for all genotypes. However, a three-peak distribution was observed when complete HBV genome sequences were used (data not shown).
Because of the aberrant behaviour seen with genotype G (Fig. 1e, f
), the X/core region (nt 1331–2291) was first excluded from analysis. With the X/core region excluded, the three-peak structure of the distribution was better delineated (Fig. 2a
). A phylogenetic tree created using DNADIST and NEIGHBOR (Felsenstein, 1989
) with the Kimura two-parameter model revealed two groups of genotypes: group I was composed of genotypes A–E and G, and group II of genotypes F and H (Fig. 2c
).
|
Similar analysis with the X/core region alone (nt 1331–2291) yielded a phylogenetic tree in which HBV genotypes fell into three groups. The first group comprised genotypes A–E, the second genotypes F and H, and the third genotype G (Fig. 3c
). Similar results were obtained using maximum-likelihood phylogenetic reconstructions (Guindon & Gascuel, 2003
) on the set of nine benchmark sequences (data not shown). The frequency distribution footprint showed that all genotype G distances, other than those within genotype G, were now within the range of intergroup distances (Fig. 3b
, box 3, light shaded bars). The movement of genotype G from group I to group III in the X/core region explains the deviation seen in the distance matrix plots, which included genotype G as one of the query sequences (Fig. 1e, f
). The data are consistent with genotype G having resulted from a recombination event between the polymerase gene of a parental genotype G and the non-polymerase region of an as-yet-undiscovered HBV isolate, for which a complete nucleotide sequence is not known. Non-human primate HBV sequences were tested and none was found to be a parental donor for the X/core region of genotype G.
|
Recombination has played a significant role in shaping some extant subgenotypes and genotypes (Bowyer & Sim, 2000
; Sugauchi et al., 2002
). Noticeable traces of intergenotypic recombination can be found within any region of the HBV genome (Simmonds & Midgley, 2005
), with a hot spot being located at nt 1627–3252, which includes the X/core region (Zhou & Holmes, 2007
). However, it is important to note that the majority of these recombinants are represented by a single HBV strain, which suggests that all of these strains have a very recent origin and/or limited fitness benefits. Data presented here on genotype G recombination, as well as published data on recombination between genotypes D and E (Bowyer & Sim, 2000
) and genotypes B and C (Sugauchi et al., 2002
), indicate that only recombination between strains from distant HBV clades within the genomic region outside the polymerase gene (within the X/core region) may generate progeny strains with significant evolutionary advantages over the parental strains.
In conclusion, present-day HBV research is orientated towards examining HBV genotypes and subgenotypes. The finding of HBV genotypes being organized into three higher-order supragenotype groups leads to the understanding that HBV sequences are organized in a more complex hierarchy. This finding may have significant implications for HBV biology, pathobiology, vaccine-implementation strategies and diagnostic assay development.
| ACKNOWLEDGEMENTS |
|---|
| REFERENCES |
|---|
|
|
|---|
Arauz-Ruiz, P. H., Norder, H., Robertson, B. H. & Magnius, L. O. (2002). Genotype H: a new Amerindian genotype of hepatitis B virus revealed in Central America. J Gen Virol 83, 2059–2073.
Bollyky, P. L. & Holmes, E. C. (1999). Reconstructing the complex evolutionary history of hepatitis B virus. J Mol Evol 49, 130–141.[CrossRef][Medline]
Bowyer, S. M. & Sim, J. G. M. (2000). Relationships within and between genotypes of hepatitis B virus at points across the genome: footprints of recombination in certain isolates. J Gen Virol 81, 379–392.
Fares, M. A. & Holmes, E. C. (2002). A revised evolutionary history of hepatitis B virus (HBV). J Mol Evol 54, 807–814.[CrossRef][Medline]
Felsenstein, J. (1989). PHYLIP – phylogeny inference package (version 3.2). Cladistics 5, 164–166.
Guindon, S. & Gascuel, O. (2003). A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52, 696–704.[CrossRef][Medline]
Hannoun, C., Horal, P. & Lindh, M. (2000). Long-term mutation rates in the hepatitis B virus genome. J Gen Virol 81, 75–83.
Kato, H., Orito, E., Gish, R. G., Bzowej, N., Newsom, M., Sugauchi, F., Suzuki, S., Ueda, R., Miyakawa, Y. & other authors (2002). Hepatitis B e antigen in sera from individuals infected with hepatitis B virus of genotype G. Hepatology 35, 922–929.[CrossRef][Medline]
Lole, K. S., Bollinger, R. C., Paranjape, R. S., Gadkari, D., Kulkarni, S. S., Novak, N. G., Ingersoll, R., Sheppard, H. W. & Ray, S. C. (1999). Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J Virol 73, 152–160.
Magnius, N. O. & Norder, H. (1995). Subtypes, genotypes and molecular epidemiology of the hepatitis B virus as reflected by sequence variability of the S-gene. Intervirology 38, 24–34.[Medline]
Martin, D. & Rybicki, E. (2000). RDP: detection of recombination amongst aligned sequences. Bioinformatics 16, 562–563.
Nei, M., Kumar, S. & Takahashi, K. (1998). The optimization principle in phylogenetic analysis tends to give incorrect topologies when the number of nucleotides or amino acids used is small. Proc Natl Acad Sci U S A 95, 12390–12397.
Norder, H., Couroucé, A. M. & Magnius, L. O. (1992). Molecular basis of hepatitis B virus variations within the four major subtypes. J Gen Virol 73, 3141–3145.
Norder, H., Couroucé, A. M. & Magnius, L. O. (1994). Complete genomes, phylogenetic relatedness, and structural proteins of six strains of the hepatitis B virus, four of which represent two new genotypes. Virology 198, 489–503.[CrossRef][Medline]
Norder, H., Couroucé, A.-M., Coursaget, P., Echevarria, J. M., Lee, S.-D., Mushahwar, I. K., Robertson, B. H., Locarnini, S. & Magnius, L. O. (2004). Genetic diversity of hepatitis B virus strains derived worldwide: genotypes, subgenotypes, and HBsAg subtypes. Intervirology 47, 289–309.[CrossRef][Medline]
Okamoto, H., Tsuda, F., Sakugawa, H., Sastrosoewignjo, R. I., Imai, M., Miyakawa, Y. & Mayumi, M. (1988). Typing hepatitis B virus by homology in nucleotide sequence: comparison of surface antigen subtypes. J Gen Virol 69, 2575–2583.
Simmonds, P. & Midgley, S. (2005). Recombination in the genesis and evolution of hepatitis B virus genotypes. J Virol 79, 15467–15476.
Strimmer, K. & von Haeseler, A. (1996). Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol Biol Evol 13, 964–969.
Stuyver, L., De Gendt, S., Van Geyt, C., Zoulim, F., Fried, M., Schinazi, R. F. & Rossau, R. (2000). A new genotype of hepatitis B virus: complete genome and phylogenetic relatedness. J Gen Virol 81, 67–74.
Sugauchi, F., Orito, E., Ichida, T., Kato, H., Sakugawa, H., Kakumu, S., Ishida, T., Chutaputti, A., Lai, C. L. & other authors (2002). Hepatitis B virus of genotype B with or without recombination with genotype C over the precore region plus the core gene. J Virol 76, 5985–5992.
Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. & Higgins, D. G. (1997). The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25, 4876–4882.
Zhou, Y. & Holmes, E. C. (2007). Bayesian estimations of the evolutionary rate and age of hepatitis B virus. J Mol Evol 65, 197–205.[CrossRef][Medline]
Received 17 August 2007;
accepted 4 February 2008.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| INT J SYST EVOL MICROBIOL | MICROBIOLOGY | J GEN VIROL |
| J MED MICROBIOL | ALL SGM JOURNALS | |