J Gen Virol
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


J Gen Virol 89 (2008), 1179-1183; DOI 10.1099/vir.0.83392-0

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow High-resolution Figures
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Purdy, M. A.
Right arrow Articles by Khudyakov, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Purdy, M. A.
Right arrow Articles by Khudyakov, Y.
Agricola
Right arrow Articles by Purdy, M. A.
Right arrow Articles by Khudyakov, Y.

Short Communication

Supragenotypic groups of the hepatitis B virus genome

Michael A. Purdy, Aileen C. Gonzales{dagger}, Zoya Dimitrova and Yury Khudyakov

Division of Viral Hepatitis, MS-A33, Centers for Disease Control and Prevention, 1600 Clifton Road NE, Atlanta, GA 30329-4018, USA

Correspondence
Michael A. Purdy
mup3{at}cdc.gov


   ABSTRACT
TOP
ABSTRACT
MAIN TEXT
REFERENCES
 
Phylogenetic relationships among hepatitis B virus (HBV) genotypes were investigated using different regions across the genome. The phylogenetic analysis in conjunction with graphical examination of phylogenetic distance matrices and distance frequency distribution plotting suggest the clustering of HBV genotypes into three higher-order hierarchical groups: group I, comprising genotypes A–E and G; group II, comprising genotypes F and H; and a hypothetical group III. Present-day genotype G is postulated to be a recombinant with the non-polymerase region of group III virus and the polymerase gene of an ancestral virus belonging to group I.

{dagger}Present address: Jet Propulsion Laboratory, 4800 Oak Grove Drive, Pasadena, CA 91109, USA. Back

High-resolution versions of the figures shown are available with the online version of this paper.


   MAIN TEXT
TOP
ABSTRACT
MAIN TEXT
REFERENCES
 
Hepatitis B virus (HBV), a member of the family Hepadnaviridae, has a circular, partially double-stranded DNA genome of about 3200 nt (Magnius & Norder, 1995Down). The genome is highly compact, containing four open reading frames (ORFs) encoding the polymerase, surface antigen (S), nucleocapsid (core) and X proteins. These ORFs show significant overlap, with almost half of the HBV genome encoding more than one protein product. Substitution rates in the overlapping and non-overlapping genomic regions vary (Bollyky & Holmes, 1999Down; Zhou & Holmes, 2007Down).

HBV genomes are segregated into eight genotypes based on sequence divergences (Norder et al., 1992Down; Bowyer & Sim, 2000Down; Stuyver et al., 2000Down; Arauz-Ruiz et al., 2002Down; Akuta & Kumada, 2005Down). Genotypes are defined as sequences within a phylogenetic clade having less than 4 % sequence divergence among the members of the clade and more that 8 % sequence divergence with extra-clade sequences (Okamoto et al., 1988Down; Norder et al., 1994Down; Bowyer & Sim, 2000Down). Divergences as high as 13.6 and 15.5 % have been noted for genotypes E and F with respect to genotypes A–D (Norder et al., 1994Down). Phylogenetic analysis of different regions and genes within the HBV genome has shown that, although the branching order of genotypes differs from the whole genome, the sequences belonging to specific genotypes consistently cluster together (Norder et al., 1994Down; Kato et al., 2002Down).

Recombination within and between genotypes has created complex patterns of evolution (Bollyky & Holmes, 1999Down; Bowyer & Sim, 2000Down). Recombination can alter the cladistic structure of genotypes. Within the core region alone, recombination between genotypes B and C is considered to lead to the formation of the Ba subgenotype (Sugauchi et al., 2002Down) and the apparent merging of genotypes D and E (Bowyer & Sim, 2000Down). A recent report on the potential recombinant nature of genotype G adds another example of a genotype, in addition to genotype E, all member strains of which originate from one ancestral intergenotype recombinant strain (Simmonds & Midgley, 2005Down). These observations emphasize a need for systematic investigation of genealogical relationships among genotypes along the HBV genome. In the present study, we conducted such an investigation of HBV genotypes using different regions of the HBV genome based on information available from a large database of sequences.

A set of 311 full-length human HBV sequences was obtained from GenBank. Non-human primate sequences and sequences with deletions were excluded. When multiple sequences came from a single patient, only one sequence was chosen. HBV strains with intergenotypic recombination have been found by many researchers (Bollyky & Holmes, 1999Down; Bowyer & Sim, 2000Down; Fares & Holmes, 2002Down; Simmonds & Midgley, 2005Down). SimPlot was used to remove individual sequences with evidence of intergenotypic recombination (Lole et al., 1999Down). Identification of recombination by SimPlot requires the use of reference sequences representing each HBV genotype. These sequences were selected by summing the square of the genetic distances between a given sequence and all other sequences in the database belonging to the same genotype, and selecting the sequence with the lowest sum-squared distance for each genotype. This approach to selecting sequences ensured that the identified benchmark sequences were representative of an entire genotype and, as such, were devoid of any elements that were uncharacteristic for the selected genotype. The use of actual instead of consensus sequences guaranteed against any potential artefactual structural features being introduced by generating the consensus. The GenBank accession numbers for the sequences selected were: AY128092 [GenBank] (genotype A), D23679 [GenBank] (genotype B), AF458664 [GenBank] (genotype C), AF121240 [GenBank] (genotype D), AB032431 [GenBank] (genotype E), AB036905 [GenBank] (genotype F), AF160501 [GenBank] (genotype G), AY090457 [GenBank] (genotype H), and AF046996 [GenBank] (woolly monkey HBV). These sequences were tested with SimPlot and RDP (Martin & Rybicki, 2000Down) to ensure they were free from recombination. No attempts were made to remove genotype- and subgenotype-wide recombinants such as genotype E (Bowyer & Sim, 2000Down) and subgenotype Ba sequences (Sugauchi et al., 2002Down). Only single intergenotypic recombinant sequences were removed from further analysis.

Sequences from the resulting dataset were aligned using CLUSTAL_X (Thompson et al., 1997Down), and all positions with gaps in the aligned sequences and the 36 nt insert in the core gene of genotype G sequences (Stuyver et al., 2000Down) were removed. No attempt was made to ensure that codon integrity was maintained. Woolly monkey HBV (GenBank accession no. AF046996 [GenBank] ) was used as an outgroup. The final dataset contained 241 human HBV sequences.

Originally, the HBV sequence dataset was examined using a sliding window of 700 bases with a 50-base step to create phylogenetic trees for each window (data not shown) to determine phylogenetic relationships between different regions of the genome in a systematic manner. However, evaluation of variations in topologies of a large set of trees is difficult. To facilitate analysis of changes of intergenotypic distances among HBV genotypes within different regions across the genome, a graphical method was devised.

This method is based on plots that depict the distances from two different genotypic sequences, called query sequences Q1 and Q2, to a third sequence, called the reference sequence R. A sliding window of 700 bases with a 50-base step was used to examine the set of nine benchmark sequences described above. The 700-base window was chosen to minimize multifurcations and intergenotypic mixing of sequences (Nei et al., 1998Down). A distance matrix was generated for each window using TREE-PUZZLE (Strimmer & von Haeseler, 1996Down). Graphs of the matrix distances were created by plotting the distances for Q1 and Q2 with respect to R. The distance values for each window for a specific pair of Q1 and Q2 to a specified R were plotted as d(R,Q2) versus d(R,Q1) (Fig. 1Down). The plot shows the change in this value within each sliding window across the HBV genome. It was assumed that, if rates of mutation are constant between Q1 and Q2 with respect to R across the genome, a straight-line plot would be expected. As the rate of mutation across the HBV genome is not constant (Bollyky & Holmes, 1999Down; Hannoun et al., 2000Down; Fares & Holmes, 2002Down), there will be deviations from a straight line. Large deviations between sequences may represent either large deviations in the rate of mutation between the query sequences or recombination events. The distance matrix plot alone cannot make a distinction between these two explanations. Additionally, if R is phylogenetically at an equal distance from both Q1 and Q2, the distance matrix plot for all sliding windows along the HBV genome should be spread along the straight line at 4 °. Since each sequence for this analysis was specifically selected to represent the entire genotype, it was expected that all plots would follow this pattern.


Figure 1
View larger version (33K):
[in this window]
[in a new window]

 
Fig. 1. Representative distance matrix plots. Distances: {circ}, nt 162–821 (HBV surface antigen); {triangleup}, nt 841–1361; {square}, nt 1381–1821 (X); {lozenge}, nt 1841–2241 (core); and {triangledown} for nt 2261–142. (a) Distances from genotype B to genotypes A and C; (b) distances from genotype B to genotypes A and H; (c) distances from genotype E to genotypes D and G; (d) distances from genotype B to genotypes E and G; (e) distances from genotype D to genotypes E and G; (f) distances from genotype H to genotypes B and G.

 
However, using these distance plots, HBV genotypes were found to be clustered into two distinct groups. The first group comprised genotypes A–E and the second genotypes F and H. Whenever Q1 and Q2 were from genotypes A–E, the value of the ratios approximated 1 (Fig. 1bUp). The same outcome was achieved when genotype F and H sequences formed the query pair (Fig. 1aUp). These results indicated that the members of each group evolved at a similar rate to the other members within the group. Whenever one sequence in a query pair was genotype A–E and the second member was either genotype F or H, the values of the ratios were about 2 or 0.5, depending on how the distance values were plotted (Fig. 1c, dUp). This outcome indicated that the two groups, one being composed of genotypes A–E and the other of genotypes F and H, were evolving at different rates. Such deviation suggests differences in hierarchical structure among genotypes. When genotype G was used as one of the query sequences, deviation from linearity was observed on all plots in the X/core region (between nt 1331 and 2291) (Fig. 1e, fUp). This finding suggests that the X/core region of genotype G has experienced a higher rate of evolution compared with the corresponding region of any other query sequence used in the distance matrix plot. This observation could be due to either a higher rate of mutation in genotype G or a recombination event. Without the X/core region included, the genotype G sequence behaved like the genotype A–E sequences.

To examine this deviation from expected behaviour in more detail, the frequency distribution of phylogenetic distances for the 242 sequence dataset (241 human sequences plus the woolly monkey sequence) was determined. To do this, DNADIST with the Kimura two-parameter model was used to generate distance matrices (Felsenstein, 1989Down). Frequency distributions were generated by segregating these phylogenetic distances into 80 equal-sized distance categories and plotting the frequency of occurrence for all distances per distance bin using the frequency function in Microsoft Excel. If HBV was stratified only as isolates (subgenotypes) and genotypes, a two-peak frequency distribution should become evident (excluding the outgroup peak) when the frequency of phylogenetic distances was plotted for all genotypes. However, a three-peak distribution was observed when complete HBV genome sequences were used (data not shown).

Because of the aberrant behaviour seen with genotype G (Fig. 1e, fUp), the X/core region (nt 1331–2291) was first excluded from analysis. With the X/core region excluded, the three-peak structure of the distribution was better delineated (Fig. 2aDown). A phylogenetic tree created using DNADIST and NEIGHBOR (Felsenstein, 1989Down) with the Kimura two-parameter model revealed two groups of genotypes: group I was composed of genotypes A–E and G, and group II of genotypes F and H (Fig. 2cDown).


Figure 2
View larger version (25K):
[in this window]
[in a new window]

 
Fig. 2. Frequency distribution of phylogenetic distances and a phylogenetic tree for the HBV genome without the X/core region. (a) Frequency distribution. Each bar represents a sequence divergence of 0.368 %. (b) Frequency distribution footprint of (a). Distances are segregated into genotype pairs represented by letter pairs on the left of the plot. Dark shaded boxes represent distance bins into which distance pairs fall without regard to frequency; light shaded boxes represent any genotype pair containing genotype G distances. The boxes marked 1, 2 and 3 represent intragenotypic, intragroup and intergroup distances, respectively. (c) Phylogenetic tree. Genotype-specific distances have been collapsed to a single node.

 
To examine this segregation further, we analysed frequency distribution footprints (Fig. 2bUp). These footprints showed segregation of sequence distances separated intragenotypically (box 1), intergenotypically (box 2) and as an intergroup (box 3). The left-most bimodal peak consisted of intra-subgenotype and individual isolate sequence distances (with divergence of less than 4 %) and inter-subgenotype distances (sequence divergence between 4 and 8 %). The bimodal pattern of these subgenotypic peaks was due to genotypes A, D and F splitting into subgenotypes (Bowyer & Sim, 2000Down; Norder et al., 2004Down).

Similar analysis with the X/core region alone (nt 1331–2291) yielded a phylogenetic tree in which HBV genotypes fell into three groups. The first group comprised genotypes A–E, the second genotypes F and H, and the third genotype G (Fig. 3cDown). Similar results were obtained using maximum-likelihood phylogenetic reconstructions (Guindon & Gascuel, 2003Down) on the set of nine benchmark sequences (data not shown). The frequency distribution footprint showed that all genotype G distances, other than those within genotype G, were now within the range of intergroup distances (Fig. 3bDown, box 3, light shaded bars). The movement of genotype G from group I to group III in the X/core region explains the deviation seen in the distance matrix plots, which included genotype G as one of the query sequences (Fig. 1e, fUp). The data are consistent with genotype G having resulted from a recombination event between the polymerase gene of a parental genotype G and the non-polymerase region of an as-yet-undiscovered HBV isolate, for which a complete nucleotide sequence is not known. Non-human primate HBV sequences were tested and none was found to be a parental donor for the X/core region of genotype G.


Figure 3
View larger version (22K):
[in this window]
[in a new window]

 
Fig. 3. Frequency distribution and phylogenetic tree for the X/core region (nt 1331–2291). The panels in this figure are the same as those in Fig. 2Up with the following exceptions. In (a), each bar represents a sequence divergence of 0.377 %. In (b), Ba and Bj are subgenotypes of genotype B, CBa represents the mergence of genotypes C and Ba, and DE represents the mergence of genotypes D and E.

 
Thus, the phylogenetic analysis of different HBV genomic regions conducted in this study suggests that HBV genotypes can be classified into supragenotypic groups and that HBV genotype G is a recombinant between supragenotypic group I and hypothetical supragenotypic group III. The X/core region of genotype G is as divergent from all other genotypes as group I is divergent from group II. This observation suggests that the genotype G X/core region is derived from a highly divergent ancestor and thus represents the prototype of a new group (III). These findings confirm a recent report (Simmonds & Midgley, 2005Down) that HBV genotype G is a recombinant. However, in contrast to the present study, that report suggested that genotype G resulted from recombination between ancestral S gene sequences comparable in divergence to those between genotypes A and E, and a more divergent HBV variant contributing the rest of the genome.

Recombination has played a significant role in shaping some extant subgenotypes and genotypes (Bowyer & Sim, 2000Down; Sugauchi et al., 2002Down). Noticeable traces of intergenotypic recombination can be found within any region of the HBV genome (Simmonds & Midgley, 2005Down), with a ‘hot spot’ being located at nt 1627–3252, which includes the X/core region (Zhou & Holmes, 2007Down). However, it is important to note that the majority of these recombinants are represented by a single HBV strain, which suggests that all of these strains have a very recent origin and/or limited fitness benefits. Data presented here on genotype G recombination, as well as published data on recombination between genotypes D and E (Bowyer & Sim, 2000Down) and genotypes B and C (Sugauchi et al., 2002Down), indicate that only recombination between strains from distant HBV clades within the genomic region outside the polymerase gene (within the X/core region) may generate progeny strains with significant evolutionary advantages over the parental strains.

In conclusion, present-day HBV research is orientated towards examining HBV genotypes and subgenotypes. The finding of HBV genotypes being organized into three higher-order supragenotype groups leads to the understanding that HBV sequences are organized in a more complex hierarchy. This finding may have significant implications for HBV biology, pathobiology, vaccine-implementation strategies and diagnostic assay development.


   ACKNOWLEDGEMENTS
 
Disclaimer: This information is distributed solely for the purpose of pre-dissemination peer review under applicable information quality guidelines. It has not been formally disseminated by the Centers for Disease Control and Prevention/Agency for Toxic Substances and Disease Registry. It does not represent and should not be construed to represent any agency determination or policy.


   REFERENCES
TOP
ABSTRACT
MAIN TEXT
REFERENCES
 
Akuta, N. & Kumada, H. (2005). Influence of hepatitis B virus genotypes on the response to antiviral therapies. J Antimicrob Chemother 55, 139–142.[Abstract/Free Full Text]

Arauz-Ruiz, P. H., Norder, H., Robertson, B. H. & Magnius, L. O. (2002). Genotype H: a new Amerindian genotype of hepatitis B virus revealed in Central America. J Gen Virol 83, 2059–2073.[Abstract/Free Full Text]

Bollyky, P. L. & Holmes, E. C. (1999). Reconstructing the complex evolutionary history of hepatitis B virus. J Mol Evol 49, 130–141.[CrossRef][Medline]

Bowyer, S. M. & Sim, J. G. M. (2000). Relationships within and between genotypes of hepatitis B virus at points across the genome: footprints of recombination in certain isolates. J Gen Virol 81, 379–392.[Abstract/Free Full Text]

Fares, M. A. & Holmes, E. C. (2002). A revised evolutionary history of hepatitis B virus (HBV). J Mol Evol 54, 807–814.[CrossRef][Medline]

Felsenstein, J. (1989). PHYLIP – phylogeny inference package (version 3.2). Cladistics 5, 164–166.

Guindon, S. & Gascuel, O. (2003). A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52, 696–704.[Abstract/Free Full Text]

Hannoun, C., Horal, P. & Lindh, M. (2000). Long-term mutation rates in the hepatitis B virus genome. J Gen Virol 81, 75–83.[Abstract/Free Full Text]

Kato, H., Orito, E., Gish, R. G., Bzowej, N., Newsom, M., Sugauchi, F., Suzuki, S., Ueda, R., Miyakawa, Y. & other authors (2002). Hepatitis B e antigen in sera from individuals infected with hepatitis B virus of genotype G. Hepatology 35, 922–929.[CrossRef][Medline]

Lole, K. S., Bollinger, R. C., Paranjape, R. S., Gadkari, D., Kulkarni, S. S., Novak, N. G., Ingersoll, R., Sheppard, H. W. & Ray, S. C. (1999). Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J Virol 73, 152–160.[Abstract/Free Full Text]

Magnius, N. O. & Norder, H. (1995). Subtypes, genotypes and molecular epidemiology of the hepatitis B virus as reflected by sequence variability of the S-gene. Intervirology 38, 24–34.[Medline]

Martin, D. & Rybicki, E. (2000). RDP: detection of recombination amongst aligned sequences. Bioinformatics 16, 562–563.[Abstract/Free Full Text]

Nei, M., Kumar, S. & Takahashi, K. (1998). The optimization principle in phylogenetic analysis tends to give incorrect topologies when the number of nucleotides or amino acids used is small. Proc Natl Acad Sci U S A 95, 12390–12397.[Abstract/Free Full Text]

Norder, H., Couroucé, A. M. & Magnius, L. O. (1992). Molecular basis of hepatitis B virus variations within the four major subtypes. J Gen Virol 73, 3141–3145.[Abstract/Free Full Text]

Norder, H., Couroucé, A. M. & Magnius, L. O. (1994). Complete genomes, phylogenetic relatedness, and structural proteins of six strains of the hepatitis B virus, four of which represent two new genotypes. Virology 198, 489–503.[CrossRef][Medline]

Norder, H., Couroucé, A.-M., Coursaget, P., Echevarria, J. M., Lee, S.-D., Mushahwar, I. K., Robertson, B. H., Locarnini, S. & Magnius, L. O. (2004). Genetic diversity of hepatitis B virus strains derived worldwide: genotypes, subgenotypes, and HBsAg subtypes. Intervirology 47, 289–309.[CrossRef][Medline]

Okamoto, H., Tsuda, F., Sakugawa, H., Sastrosoewignjo, R. I., Imai, M., Miyakawa, Y. & Mayumi, M. (1988). Typing hepatitis B virus by homology in nucleotide sequence: comparison of surface antigen subtypes. J Gen Virol 69, 2575–2583.[Abstract/Free Full Text]

Simmonds, P. & Midgley, S. (2005). Recombination in the genesis and evolution of hepatitis B virus genotypes. J Virol 79, 15467–15476.[Abstract/Free Full Text]

Strimmer, K. & von Haeseler, A. (1996). Quartet puzzling: a quartet maximum likelihood method for reconstructing tree topologies. Mol Biol Evol 13, 964–969.

Stuyver, L., De Gendt, S., Van Geyt, C., Zoulim, F., Fried, M., Schinazi, R. F. & Rossau, R. (2000). A new genotype of hepatitis B virus: complete genome and phylogenetic relatedness. J Gen Virol 81, 67–74.[Abstract/Free Full Text]

Sugauchi, F., Orito, E., Ichida, T., Kato, H., Sakugawa, H., Kakumu, S., Ishida, T., Chutaputti, A., Lai, C. L. & other authors (2002). Hepatitis B virus of genotype B with or without recombination with genotype C over the precore region plus the core gene. J Virol 76, 5985–5992.[Abstract/Free Full Text]

Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. & Higgins, D. G. (1997). The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25, 4876–4882.[Abstract/Free Full Text]

Zhou, Y. & Holmes, E. C. (2007). Bayesian estimations of the evolutionary rate and age of hepatitis B virus. J Mol Evol 65, 197–205.[CrossRef][Medline]

Received 17 August 2007; accepted 4 February 2008.



This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow High-resolution Figures
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Purdy, M. A.
Right arrow Articles by Khudyakov, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Purdy, M. A.
Right arrow Articles by Khudyakov, Y.
Agricola
Right arrow Articles by Purdy, M. A.
Right arrow Articles by Khudyakov, Y.


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
INT J SYST EVOL MICROBIOL MICROBIOLOGY J GEN VIROL
J MED MICROBIOL ALL SGM JOURNALS