|
|
||||||||
1 Division of Infection and Immunity, Royal Free and University College Medical School, The Windeyer Building, 46 Cleveland Street, London W1T 4JF, UK
2 Department of Virology, UCLH NHS Foundation Trust, The Windeyer Building, 46 Cleveland Street, London W1T 4JF, UK
Correspondence
Richard Tedder
r.tedder{at}ucl.ac.uk
| ABSTRACT |
|---|
|
|
|---|
A supplementary table showing the NCBI reference sequence set is available in JGV Online.
| INTRODUCTION |
|---|
|
|
|---|
A number of host factors are recognized that affect the outcome of the infection, e.g. infection in early life or in the face of immunosuppression favours carriage, although less is known about virological factors that might also influence the outcome of both acute and chronic infection. There is increasing evidence that different HBV genotypes may be associated with alternative disease profiles and differing responses to antiviral therapy. Studies in Taiwan and Japan have suggested that, in patients with cirrhosis, genotype C is more common, whereas HCC is associated with genotype B in patients younger than 50 years old and with genotype C in those over 50 (Ding et al., 2001
; Fujie et al., 2001
; Kao et al., 2000
). HBV genotype also appears to influence the viral response in carriers undergoing seroconversion from having HBV e antigen (HBeAg) in their serum to having antibody to HBeAg (anti-HBe) (Delaney et al., 2001
; Lok et al., 1994
).
Sequence variation within the HBV genome has been classified into the eight genotypes AH, which have been defined by differences in their full-length genome of >8 % (Norder et al., 2004
). These are distinct from serological subtypes, which are defined by the antigenicity of the HBV surface antigen (HBsAg) (Couroucé et al., 1983
), determined by amino acids at specific residues in the a determinant of HBsAg. Although improvements in molecular biology, computational power and phylogenetic algorithms have facilitated characterization and genotyping of the full-length HBV genome, many genotype predictions are often determined more practically by sequencing a smaller region of the genome, usually the surface antigen.
Antiviral therapy targeting the reverse transcription function of polymerase is increasingly used in clinical practice to suppress virus replication in carriers. Not surprisingly, long-term use of single drugs in the face of continued replication is associated with viral mutational escape from the drug. This can be monitored by direct sequencing of polymerase (pol) (Lai et al., 2003
; Tenney et al., 2004
), which overlaps entirely a different open reading frame for the HBsAg (s) gene. Thus, sequencing of pol will also generate s data and provides an increasingly large sequence repository that could be used for genotyping, potentially providing both epidemiological data and further insight into hostparasite relationships.
There are currently few methods available for genotyping HBV. It is possible to analyse phylogenetic trees including novel sequences and a set of reference sequences, an approach that requires skill to interpret and is not reliable for outlying or recombinant sequences. There is also one web-based genotyping tool from the National Centre for Biotechnology Information (NCBI; http://www.ncbi.nih.gov/projects/genotyping/formpage.cgi), which uses the BLAST algorithm along a sliding window to compare HBV nucleotide sequences with reference subtype sequences. It does not, however, attempt to assign any confidence to the subtype prediction and does not allow batch submission of sequences. Here, we adapt our previously described high-throughput human immunodeficiency virus type 1 (HIV-1) pol subtyping tool Subtype Analyser (STAR) (Gale et al., 2004
; Myers et al., 2005
) for HBV genotyping using full-length genomes. We also describe the genotyping of HBV using surface/polymerase gene sequences. Large datasets comprising HBV core gene and basal core promoter (BCP) sequences are also in existence and we also examine the utility of STAR analysis of these regions to predict HBV genotypes. A web interface to HBV STAR is available at http://www.vgb.ucl.ac.uk/starn.shtml.
| METHODS |
|---|
|
|
|---|
Genotyping analysis was performed by using HBV STAR. This method converts genotype-specific alignments into position-specific scoring matrices (PSSMs) for each genotype and then compares a query sequence of unknown genotype to each PSSM, as described previously for HIV (Myers et al., 2005
). The scores that are generated by this comparison (eight scores in the case of HBV) are transformed into Z scores, giving the eight-point distribution a mean of zero and a standard deviation of 1. The genotype PSSM generating the highest Z score and the magnitude of this Z score are used to predict the genotype of the query sequence and the confidence of that genotype prediction, respectively. Z scores >2.0 are considered indicative of a significant genotype prediction. The PSSM generating the highest raw score also generates the highest Z score; however, Z-score transformation has been developed to normalize HBV STAR scoring such that longer sequences do not generate arbitrarily higher scores. In cases where the query sequence is recombinant, two genotype PSSMs will produce high raw scores; however, the resulting maximal Z score will be lowered. Query sequences that generate low Z-score predictions (<2.0) are therefore recompared with the genotype PSSMs to detect the presence of putative HBV recombination in a two-stage process. The query sequence is genotyped as defined previously and then a separate process analysing the sequences for recombination is performed. Recombination detection was conducted by using the difference in sequence identity relative to the ascribed genotype along a sliding window of 150 bp with a step interval of one base. Sequences containing a segment in excess of 150 bp where the mean sequence identity was more similar to a different genotype and that diverged from the ascribed genotype by >1 % were considered as potential recombinants. This procedure can be initiated automatically on the basis of a user-definable minimum Z-score threshold. Regions of the query sequence that are more similar to a PSSM that is not the predicted genotype are identified by accumulating the difference between the normalized positional nucleotide frequencies.
Receiver-Operated Characteristic (ROC) analysis was used to assess and compare the performance of HBV STAR by determining the sensitivity and specificity of genotype predictions over a range of Z-score thresholds. At any Z-score value within the range examined, True Positives (TP) were those with correctly assigned genotype with Z scores above the given threshold, and False Negatives (FN) were those falling below a Z-score threshold. This was performed for 464 HBV genotype sequences used to define the eight genotype PSSMs. False Positives (FP) and True Negatives (TN) were assigned on the basis of genotype prediction for a set of non-human HBV sequences, FP scoring above and TN below a Z-score threshold, respectively. Sensitivity was calculated as TP/(TP+FN) and specificity was calculated as TN/(TN+FP).
| RESULTS |
|---|
|
|
|---|
|
|
|
|
|
|
| DISCUSSION |
|---|
|
|
|---|
Analysis of the performance of HBV STAR compared genotyping of HBV sequences of differing length. Clearly, reductions in the query sequence length could result in a loss of genotype-specific signal, thereby causing a greater likelihood of errors in the genotype prediction. Comparisons between genotyping sensitivity and specificity of subgenomic regions of HBV relative to full-length genomes showed that genotyping using the surface gene remained of high specificity. In some ways this was not surprising, as the surface/polymerase-encoding region of HBV has long been used for phylogenetic assignment of genotype. This indicates a strong relationship between sequence variation and genotype, even though variation is constrained by the overlap of surface and polymerase genes. The finding of a 1.3 % FN-prediction rate when reclassifying the surface gene and full-length sequences within HBV STAR represented the only classification errors. Non-assignment of genotype to a query sequence is safer than incorrect assignment of genotype and was detected here on the basis of low Z scores. A proportion of the low-Z-scoring sequences (n=16) showed evidence of recombination. HBV STAR recombination analysis of full-length sequences identified true recombinant sequences (gi16751309, gi15419825, gi10443814, gi10443806 and gi10443822) and other putative recombinant sequences. The fact that HBV STAR generated no FP results during analysis of full-length and surface HBV sequences shows that this tool performs well using these regions of HBV sequence.
In the absence of surface gene sequence, core and BCP sequences may give an indication of genotype. However, whilst BCP and core sequences were genotyped accurately 92.5 and 93.2 % of the time, respectively, there was a slightly increased risk of FP prediction (1.7 and 1.9 %). This risk increased when predictions of genotypes A and D were considered by using BCP sequences (4.5 %) and when predictions of genotype A were considered by using core sequences (4 %). Even though they contain more variation than the surface gene, the core and BCP regions of the HBV genome are difficult to use as predictors of HBV genotype. This arises because the sequence variation encoded within these regions does not appear to be solely genotype-specific. We surmise that the sequence variation within core and BCP of HBV may be a function of both genotype and interaction with the host. Seroconversion from HBeAg to anti-HBe in carriers is associated with changes in the core/precore-encoding region (Delaney et al., 2001
; Lok et al., 1994
). It seems entirely plausible that changes in the BCP corresponding to different liver transcription-factor binding may provide selection-driven sequence changes that confuse attempts to genotype based on this region. These findings suggest that prediction of genotype using HBV core and BCP sequences would be more error-prone than that using full-length sequences and surface-encoding regions.
Here, we have developed a genotyping tool, HBV STAR, that can perform rapid, accurate and statistically robust analysis of HBV sequences without the need to generate phylogenetic trees. We have shown when comparing HBV reference genomes that it is accurate using full-length sequences, as well as when using clinically derived subgenomic sequences. It is also able to detect recombinant genotypes. The ease of use, sequence-region flexibility and rapidity of the tool allow large databases of HBV clinical sequences to be analysed. As the number of HBV antiviral drugs inevitably increases, so will the requirement for direct sequencing of pol. Given that the surface gene of HBV overlaps with pol, this will result in an expanded dataset of surface sequences. We therefore believe that the utility of HBV STAR will increase accordingly. Whilst HBV genotypes are constrained traditionally by the global region from which the host originates, ever-increasing population movements and migrations mean that HBV genotypes will move into new geographical regions and new human populations. Identifying and tracking this HBV genotype movement will be critical in terms of both national monitoring of infection rates and predicting the disease and its treatment.
| ACKNOWLEDGEMENTS |
|---|
| REFERENCES |
|---|
|
|
|---|
Delaney, W. E., IV, Locarnini, S. & Shaw, T. (2001). Resistance of hepatitis B virus to antiviral drugs: current aspects and directions for future investigation. Antivir Chem Chemother 12, 135.[Medline]
Ding, X., Mizokami, M., Yao, G., Xu, B., Orito, E., Ueda, R. & Nakanishi, M. (2001). Hepatitis B virus genotype distribution among chronic hepatitis B virus carriers in Shanghai, China. Intervirology 44, 4347.[CrossRef][Medline]
Felsenstein, J. (1996). Inferring phylogenies from protein sequences by parsimony, distance, and likelihood methods. Methods Enzymol 266, 418427.[Medline]
Fujie, H., Moriya, K., Shintani, Y., Yotsuyanagi, H., Iino, S. & Koike, K. (2001). Hepatitis B virus genotypes and hepatocellular carcinoma in Japan. Gastroenterology 120, 15641565.[Medline]
Gale, C. V., Myers, R., Tedder, R. S., Williams, I. G. & Kellam, P. (2004). Development of a novel human immunodeficiency virus type 1 subtyping tool, Subtype Analyzer (STAR): analysis of subtype distribution in London. AIDS Res Hum Retroviruses 20, 457464.[CrossRef][Medline]
Kao, J. H., Chen, P. J., Lai, M. Y. & Chen, D. S. (2000). Hepatitis B genotypes correlate with clinical outcomes in patients with chronic hepatitis B. Gastroenterology 118, 554559.[CrossRef][Medline]
Lai, C.-L., Dienstag, J., Schiff, E. & 7 other authors (2003). Prevalence and clinical correlates of YMDD variants during lamivudine therapy for patients with chronic hepatitis B. Clin Infect Dis 36, 687696.[CrossRef][Medline]
Lok, A. S. F., Akarca, U. & Greene, S. (1994). Mutations in the pre-core region of hepatitis B virus serve to enhance the stability of the secondary structure of the pre-genome encapsidation signal. Proc Natl Acad Sci U S A 91, 40774081.
Lole, K. S., Bollinger, R. C., Paranjape, R. S., Gadkari, D., Kulkarni, S. S., Novak, N. G., Ingersoll, R., Sheppard, H. W. & Ray, S. C. (1999). Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J Virol 73, 152160.
Myers, R. E., Gale, C. V., Harrison, A., Takeuchi, Y. & Kellam, P. (2005). A statistical model for HIV-1 sequence classification using the subtype analyser (STAR). Bioinformatics 21, 35353540.
Norder, H., Couroucé, A.-M., Coursaget, P., Echevarria, J. M., Lee, S.-D., Mushahwar, I. K., Robertson, B. H., Locarnini, S. & Magnius, L. O. (2004). Genetic diversity of hepatitis B virus strains derived worldwide: genotypes, subgenotypes, and HBsAg subtypes. Intervirology 47, 289309.[CrossRef][Medline]
Tenney, D. J., Levine, S. M., Rose, R. E. & 14 other authors (2004). Clinical emergence of entecavir-resistant hepatitis B virus requires additional substitutions in virus already resistant to lamivudine. Antimicrob Agents Chemother 48, 34983507.
Thompson, J. D., Gibson, T. J., Plewniak, F., Jeanmougin, F. & Higgins, D. G. (1997). The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25, 48764882.
Received 5 December 2005;
accepted 29 January 2006.
This article has been cited by other articles:
![]() |
H. L. Zaaijer, F. J. van Hemert, M. H. Koppelman, and V. V. Lukashov Independent evolution of overlapping polymerase and surface protein genes of hepatitis B virus J. Gen. Virol., August 1, 2007; 88(8): 2137 - 2143. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Lu, T. Rowley, R. Garten, and R. O. Donis FluGenome: a web tool for genotyping influenza A virus Nucleic Acids Res., July 13, 2007; 35(suppl_2): W275 - W279. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Gnaneshan, S. Ijaz, J. Moran, M. Ramsay, and J. Green HepSEQ: International Public Health Repository for Hepatitis B Nucleic Acids Res., January 12, 2007; 35(suppl_1): D367 - D370. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| INT J SYST EVOL MICROBIOL | MICROBIOLOGY | J GEN VIROL |
| J MED MICROBIOL | ALL SGM JOURNALS | |