|
|
||||||||
Short Communication |
Department of Genetics, Anthropology and Evolution, University of Parma, Parco Area delle Scienze 11/A, I-43100 Parma, Italy
Correspondence
Angelo Pavesi
angelo.pavesi{at}unipr.it
| ABSTRACT |
|---|
|
|
|---|
| MAIN TEXT |
|---|
|
|
|---|
It has been claimed that strong constraints should be the rule in overlapping genes, as a single mutation will impair two amino acid sequences. A typical example of constrained evolution concerns the overlapping envelope and polymerase genes of Hepatitis B virus (Mizokami et al., 1997
). The detection of unusually strong constraints at the third codon position in the large open reading frame (ORF) of Hepatitis C virus and Hepatitis G virus has supported the hypothesis that an additional protein is encoded by the overlapping frame (Pavesi, 2000
; Walewski et al., 2001
). The same feature has been associated, in gene P of Vesicular stomatitis virus, with the existence of a new overlapping ORF (Spiropoulou & Nichol, 1993
).
Some recent papers, however, indicate that overlapping genes can exhibit a more flexible pattern of change. Simian immunodeficiency virus (Hughes et al., 2001
), Potato leafroll virus (Guyader & Ducray, 2002
) and Human papillomavirus (Narechania et al., 2005
) all show a high rate of non-synonymous change in one reading frame (positive selection) with concurrent dominance of synonymous substitutions in the alternative frame (purifying selection).
Although several cases of gene overlap have been identified, less effort has been devoted to determining their origins. Usually, this task is performed with a phylogenetic approach, which compares homologous genes from a wide number of virus families. The presence of gene overlap in a given family, and its lack in all others, supports the following hypothesis: the ancestral frame is that shared by all viruses, whilst the frame that originated by overprinting is typical of a few closely related viruses. By this approach, the origin of overlapping genes has been clarified in tymoviruses, luteoviruses, lentiviruses and paramyxoviruses (Keese & Gibbs, 1992
; Jordan et al., 2000
).
In this paper, we have investigated the origin and evolution of gene overlap in the bacteriophages that belong to the family Microviridae and infect Escherichia coli. The coliphages
X174,
3 and G4 show a similar genome structure, and the most convincing evidence of homology comes from the genes that overlap. In all phages, gene E is encoded entirely within gene D; likewise, gene B lies within gene A and gene K lies within genes A and C (Sanger et al., 1977
; Godson et al., 1978
; Kodaira et al., 1992
).
Following the suggestion that out-of-frame expression of a gene often entails a bias at the third codon position (Keese & Gibbs, 1992
), the origin of gene overlap was investigated by comparing the patterns of codon usage in overlapping and non-overlapping genes. This approach was preferred to the phylogenetic method described above, because the microviruses that infect E. coli present a genome structure rather different from those that parasitize Chlamydia psittaci, Spiroplasma melliferum and Bdellovibrio bacteriovorus (Renaudin et al., 1987
; Storey et al., 1989
; Brentlinger et al., 2002
).
In total, 30 complete genome sequences of coliphages were collected from GenBank (accession nos AF176027AF176034, AF299300AF299314, J02482, AF274751, M14428, AF454431, V00657 and X60322X60323). This corpus of data contains 24 isolates from
X174, two isolates from S13, two isolates from G4 and one isolate from each of
3 and
K, respectively.
The protein-coding region of each genome sequence was separated into 13 segments. Four of them correspond to the non-overlapping genes J, F, G and H. Genes A, C and D were subdivided into the corresponding overlapping and non-overlapping regions, yielding a total of six segments. Genes B, E and K, all embedded entirely within other genes, yielded the remaining three segments.
The use of synonymous codons was evaluated with the relative synonymous-codon usage (RSCU) index (Sharp & Li, 1987
). For each of the 59 degenerate codons, the RSCU value was calculated as follows:
|
|
The main features of the pattern of codon usage in coliphages are illustrated in the two-dimensional map yielded by PCA (Fig. 1
). The projection of points on axis 1, which accounts for 30 % of the total information, separated the 13 coding regions into two groups. Such a clustering reflects two distinct patterns of codon usage. Points at the extreme left of axis 1 (position-coordinate values from 3·1 to 5·5) correspond to the non-overlapping genes H, F and J. Points at the extreme right of axis 1 (position-coordinate values of 5·2 and 5·8) correspond to the overlapping genes E and K.
|
The map positions of genes A and B supported a different evolutionary pathway. Unlike gene D, the overlapping region of gene A was placed at a distance from its non-overlapping counterpart (position-coordinate values of 3·3 and 1·4, respectively). This result proves the existence in gene A of two distinct patterns of codon usage, one similar to ancestral genes and the other to derived genes. Gene B shows a use of synonyms roughly similar to that of non-overlapping genes, because it was placed at the beginning of the left half of axis 1 (position-coordinate value of 0·52). These findings suggest that the coliphage genome originally possessed a short gene A, located in close proximity to gene B. Gene A probably reached its present length by taking advantage of a new termination codon beyond gene B.
Like gene A, the overlapping and non-overlapping regions of gene C were placed at a distance from each other (position-coordinate values of 4·1 and 0·6, respectively). Again, an original shorter gene C, which evolved by using as initiator codon an upstream ATG localized at the end of gene A, can be hypothesized.
The projection of points on axis 2, which accounts for 16 % of the total information, separated the overlapping gene K (position-coordinate value of 6·6) from all of the other coding regions (position-coordinate values from 4·0 to 1·9). The distinctiveness of gene K at both the first and second axes of ordination reflects a use of synonyms that is very different from that of all other genes, thus further supporting the view of a de novo origin by overprinting. The gradual increase in complexity of the phage genome from a putative ancestral state lacking gene overlap to a derived state with a high density of genetic information is summarized in Fig. 2
.
|
Multiple alignment of the sequences of genes A/B and D/E was carried out with the CLUSTAL W program (Thompson et al., 1994
). Sequences were taken from the most divergent coliphages (
X174,
3 and G4), after exclusion of redundant data. The predicted amino acid sequences were first aligned in CLUSTAL W and the respective codons were then placed on them. The amino-terminal region of protein A, at which the alignment postulated large insertions or deletions, was excluded. The rates of synonymous and non-synonymous substitution were estimated by the method of Nei & Gojobori (1986)
.
As reported in Table 1
, the non-overlapping region of gene A shows a rate of non-synonymous substitution per site (Kn) about six times lower than that of synonymous substitution (Ks), yielding a Kn/Ks ratio significantly lower than unity (0·16). As a similar Kn/Ks ratio (0·33) was also found in the region of overlap, it can be hypothesized that a substantial proportion of amino acid changes in protein A must have been eliminated by purifying selection. The relatively high rate of non-synonymous change in gene B (Kn=0·56) is due to the overlap between the third codon position of gene A and the first codon position of gene B. Being greater than unity, the Kn/Ks ratio found in gene B (1·24) is indicative of adaptive or positive selection.
|
These findings suggest that the rate of nucleotide change in the two-phase coding regions of genes A and D, albeit considerably lower than that of the one-phase coding regions (see the absolute values of Kn and Ks in Table 1
), does not preclude a pattern of adaptive evolution in the corresponding overlapping genes, B and E.
Finally, the origins of genes A/B and D/E were further investigated by using Pearson's correlation test, with the aim of validating the results provided by PCA. The RSCU values from each overlapping gene were compared with those obtained from the whole set of non-overlapping genes. A highly significant correlation (P<0·001) was found in genes B and D (r values of 0·53 and 0·69, respectively), whilst a lack of correlation was found in gene E (r=0·01). These findings confirm the hypothesis that genes B and D are ancestral genes, whereas gene E is a more-recent gene.
Analysis of gene A yielded two remarkably different r values: a high degree of correlation in the non-overlapping part (r=0·88) and a poor degree of correlation in the overlapping part (r=0·02). This result is in accordance with the hypothesis stated above, that is, a shorter gene A that evolved by using the sequence of gene B in a different translational frame.
The findings on overlapping genes presented here can be discussed by taking into account their role during the infectious cycle. For example, the overlapping gene E plays a crucial role in the final step of infection, as it encodes a protein causing lysis of the host cell (Bläsi & Lubitz, 1985
). It is important to note that protein E is not an essential structural component of the phage, as normal phage particles are produced in the absence of lysis (Hutchinson & Sinsheimer, 1966
). Thus, the acquisition of gene E by overprinting can be viewed as an evolutionary advantage favouring the diffusion of mature phages. Interestingly, a similar role has been assigned to a new overlapping gene found in tymoviruses (Bozarth et al., 1992
).
Another gene that arose by overprinting, gene K, also encodes a less-essential protein, as demonstrated by the finding that mutants of
X174 are viable even when they make no detectable K protein (Tessman et al., 1980
). However, a study by Gillam et al. (1985)
assigned a phenotype to protein K, as it demonstrated that mutant phages lacking gene K show a burst size sixfold lower than that of wild-type phages. Again, a beneficial effect for phage growth is provided by a gene that originated by overprinting. The lack of mutational studies on the overlapping region of genes A and C does not enable us to make inferences on the effects of gene overlap.
The proposed ancestry of genes B and D is consistent with their function during the phage life cycle. Both genes encode essential structural proteins that are required for the phage procapsid to be formed (Dokland et al., 1997
). Positive selection found in gene B is consistent with the detection of a big-benefit mutation that allows phage growth at high temperature (Bull et al., 2000
).
Although the presence of overlapping genes in coliphages has long been identified, studies on their origins are rather fragmentary. The method presented here is based on a detailed analysis of the codon-usage pattern. It takes advantage of the use of PCA, a multivariate statistical technique capable of providing a low-dimensional representation for large amounts of data. As shown in Fig. 1
, the first two axes of ordination are more than sufficient to detect the main patterns of codon usage. By this approach, the increase in the genome complexity during coliphage evolution can be appreciated (Fig. 2
).
The correlation test (between an individual overlapping gene and the entire set of non-overlapping genes) should be considered as an auxiliary tool, to meet the objection that the use of synonyms in short genes can be affected by biased amino acid composition. More generally, the utility of our method lies in the fact that it overcomes the need for a phylogenetic analysis. Thus, it could be an adequate tool for investigating the origins of gene overlap, especially in those viruses with poor phylogenetic information.
| ACKNOWLEDGEMENTS |
|---|
| REFERENCES |
|---|
|
|
|---|
X174 lysis gene E on its lysis-inducing properties. J Gen Virol 66, 12091213.Bozarth, C. S., Weiland, J. J. & Dreher, T. W. (1992). Expression of ORF-69 of turnip yellow mosaic virus is necessary for viral spread in plants. Virology 187, 124130.[CrossRef][Medline]
Brentlinger, K. L., Hafenstein, S., Novak, C. R., Fane, A. B., Borgon, R., McKenna, R. & Agbandje-McKenna, M. (2002). Microviridae, a family divided: isolation, characterization, and genome sequence of
MH2K, a bacteriophage of the obligate intracellular parasitic bacterium Bdellovibrio bacteriovorus. J Bacteriol 184, 10891094.
Bull, J. J., Badgett, M. R. & Wichman, H. A. (2000). Big-benefit mutations in a bacteriophage inhibited with heat. Mol Biol Evol 17, 942950.
Dokland, T., McKenna, R., Ilag, L. L., Bowman, B. R., Incardona, N. L., Fane, B. A. & Rossmann, M. G. (1997). Structure of a viral procapsid with molecular scaffolding. Nature 389, 308313.[CrossRef][Medline]
Gibbs, A. & Keese, P. K. (1995). In search of the origins of viral genes. In Molecular Basis of Virus Evolution, pp. 7690. Edited by A. Gibbs, C. H. Calisher & F. García-Arenal. Cambridge: Cambridge University Press.
Gillam, S., Atkinson, T., Markham, A. & Smith, M. (1985). Gene K of bacteriophage
X174 codes for a protein which affects the burst size of phage production. J Virol 53, 708709.
Godson, G. N., Barrell, B. G., Staden, R. & Fiddes, J. C. (1978). Nucleotide sequence of bacteriophage G4 DNA. Nature 276, 236247.[CrossRef][Medline]
Guyader, S. & Ducray, D. G. (2002). Sequence analysis of Potato leafroll virus isolates reveals genetic stability, major evolutionary events and differential selection pressure between overlapping reading frame products. J Gen Virol 83, 17991807.
Hughes, A. L., Westover, K., da Silva, J., O'Connor, D. H. & Watkins, D. I. (2001). Simultaneous positive and purifying selection on overlapping reading frames of the tat and vpr genes of simian immunodeficiency virus. J Virol 75, 79667972.
Hutchinson, C. A., III & Sinsheimer, R. L. (1966). The process of infection with bacteriophage
X174. X. Mutations in a
X lysis gene. J Mol Biol 18, 429447.[Medline]
Jordan, I. K., Sutter, B. A., IV & McClure, M. A. (2000). Molecular evolution of the paramyxoviridae and rhabdoviridae multiple-protein-encoding P gene. Mol Biol Evol 17, 7586.
Keese, P. K. & Gibbs, A. (1992). Origin of genes: "big bang" or continuous creation? Proc Natl Acad Sci U S A 89, 94899493.
Kodaira, K., Nakano, K., Okada, S. & Taketo, A. (1992). Nucleotide sequence of the genome of the bacteriophage
3: interrelationship of the genome structure and the gene products with those of the phages,
X174, G4 and
K. Biochim Biophys Acta 1130, 277288.[Medline]
Krakauer, D. C. (2000). Stability and evolution of overlapping genes. Evolution 54, 731739.[Medline]
Miyata, T. & Yasunaga, T. (1978). Evolution of overlapping genes. Nature 272, 532535.[CrossRef][Medline]
Mizokami, M., Orito, E., Ohba, K., Ikeo, K., Lau, J. Y. N. & Gojobori, T. (1997). Constrained evolution with respect to gene overlap of hepatitis B virus. J Mol Evol 44 (Suppl. 1), S83S90.
Morrison, D. F. (1976). Multivariate Statistical Methods. New York: McGraw-Hill.
Narechania, A., Terai, M. & Burk, R. D. (2005). Overlapping reading frames in closely related human papillomaviruses result in modular rates of selection within E2. J Gen Virol 86, 13071313.
Nei, M. & Gojobori, T. (1986). Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol 3, 418426.[Abstract]
Pavesi, A. (2000). Detection of signature sequences in overlapping genes and prediction of a novel overlapping gene in hepatitis G virus. J Mol Evol 50, 284295.[Medline]
Pavesi, A., De Iaco, B., Granero, M. I. & Porati, A. (1997). On the informational content of overlapping genes in prokaryotic and eukaryotic viruses. J Mol Evol 44, 625631.[CrossRef][Medline]
Renaudin, J., Pascarel, M.-C. & Bové, J.-M. (1987). Spiroplasma virus 4: nucleotide sequence of the viral DNA, regulatory signals, and proposed genome organization. J Bacteriol 169, 49504961.
Sander, C. & Schulz, G. E. (1979). Degeneracy of the information contained in amino acid sequences: evidence from overlaid genes. J Mol Evol 13, 245252.[CrossRef][Medline]
Sanger, F., Air, G. M., Barrell, B. G., Brown, N. L., Coulson, A. R., Fiddes, C. A., Hutchinson, C. A., Slocombe, P. M. & Smith, M. (1977). Nucleotide sequence of bacteriophage
X174 DNA. Nature 265, 687695.[CrossRef][Medline]
Sharp, P. M. & Li, W.-H. (1987). The codon adaptation index a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res 15, 12811295.
Spiropoulou, C. F. & Nichol, S. T. (1993). A small highly basic protein is encoded in overlapping frame within the P gene of vesicular stomatitis virus. J Virol 67, 31033110.
Storey, C. C., Lusher, M. & Richmond, S. J. (1989). Analysis of the complete nucleotide sequence of Chp1, a phage which infects avian Chlamydia psittaci. J Gen Virol 70, 33813390.
Tessman, E. S., Tessman, I. & Pollock, T. J. (1980). Gene K of bacteriophage
X174 codes for a nonessential protein. J Virol 33, 557560.
Thompson, J. D., Higgins, D. G. & Gibson, T. J. (1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res 22, 46734680.
Walewski, J. L., Keller, T. R., Stump, D. D. & Branch, A. D. (2001). Evidence for a new hepatitis C virus antigen encoded in an overlapping reading frame. RNA 7, 710721.[Abstract]
Received 25 July 2005;
accepted 20 December 2005.
This article has been cited by other articles:
![]() |
C. Rancurel, M. Khosravi, A. K. Dunker, P. R. Romero, and D. Karlin Overlapping Genes Produce Proteins with Unusual Sequence Properties and Offer Insight into De Novo Protein Creation J. Virol., October 15, 2009; 83(20): 10719 - 10736. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Dickins and A. Nekrutenko High-Resolution Mapping of Evolutionary Trajectories in a Phage Gen Biol Evol, September 2, 2009; 2009(0): 294 - 307. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. McCauley, S. de Groot, T. Mailund, and J. Hein Annotation of selection strengths in viral genomes Bioinformatics, November 15, 2007; 23(22): 2978 - 2986. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Cai, B. Hartnett, C. Gustafsson, and J. Peccoud A syntactic model to design and verify synthetic genetic constructs derived from standard biological parts Bioinformatics, October 15, 2007; 23(20): 2760 - 2767. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Kingsford, A. L. Delcher, and S. L. Salzberg A Unified Model Explaining the Offsets of Overlapping and Near-Overlapping Prokaryotic Genes Mol. Biol. Evol., September 1, 2007; 24(9): 2091 - 2098. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. L. Zaaijer, F. J. van Hemert, M. H. Koppelman, and V. V. Lukashov Independent evolution of overlapping polymerase and surface protein genes of hepatitis B virus J. Gen. Virol., August 1, 2007; 88(8): 2137 - 2143. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Szklarczyk, J. Heringa, S. K. Pond, and A. Nekrutenko Rapid asymmetric evolution of a dual-coding tumor suppressor INK4a/ARF locus contradicts its function PNAS, July 31, 2007; 104(31): 12807 - 12812. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Mayrose, A. Doron-Faigenboim, E. Bacharach, and T. Pupko Towards realistic codon models: among site variability and dependency of synonymous and non-synonymous rates Bioinformatics, July 1, 2007; 23(13): i319 - i327. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| INT J SYST EVOL MICROBIOL | MICROBIOLOGY | J GEN VIROL |
| J MED MICROBIOL | ALL SGM JOURNALS | |