Horizontal Gene Transfer and Genome Organization
It is well known that processes that result in lateral gene transfer affect reconstruction of organismal phylogeny. Okay, that’s a very interesting computational problem. Biologically speaking, how much can horizontal gene transfer influence a genome? Can we derive interesting statistics on HGT segments? How to identify them? The rest of this post will touch on these interesting ideas.
Using a few statistical tools, Lawrence and Ochman have quantified the amount of genes that were horizontally transferred in Escherichia coli . Their analysis is based on the following ideas: First, analysis of GC content will show significant bias in segments that are laterally transferred. They will be reflective of codon bias or the average GC content of the organism(s) from which the segments were derived from. Second, a plot of Khi-square statistic of the ORF’s codon usage vs. it’s Codon Adaptation Index (CAI). Every gene’s corresponding values will now be a point in this space. For a given gene, in an organism, CAI is a measure of how much it’s codon usage are similar to the codon usage of highly expressed genes in that organism. In other words, genes with a high Khi-sq statistics and a low CAI would i) reflect that their codon usage are very much in accordance with the local gene characteristics, but ii) they do not match with the preferred codon statistics of the organism itself. This is hence, a very useful plot in computational genomics.
Let’s get to the results. The authors show that 17.6% of the predicted ORFs in E.coli arose through horizontal gene transfer! That was around 755 ORFs. It is now interesting to ask where these HGT segments are often located. It appears that HGT segments are preferentially located near tRNA loci - quite interesting, isn’t it? Also note that, this preferential integration is usually the result of insertion of elements from phages. Phages, therefore, have helped enormously in shaping the genome of E.coli. (There is also an interesting observation noted on the paper based on the association of IS elements with the HGT segments, if you are interested.)
Now, the reader may probably have thought about the simple idea of “amelioration”, where the sequence statistics of HGT segments over time would evolve to mirror the native genome’s characteristics more closely. The authors use the substitution rates and the mutation bias in E.coli, it may be possible to quantify the rate of amelioration . They derive very interesting statistics based on the results of this analysis - for example, the E.coli has accumulated 1600kb of novel genes just through HGT, after it diverged from S.enterica!
References and Further Reading:
1. Lawrence JG, Ochman H (1998) Molecular archaeology of the Escherichia coli genome. Proceedings of the National Academy of Sciences 95:9413 -9417.
2. Doolittle WF (1999) Phylogenetic Classification and the Universal Tree. Science 284:2124 -2128.
3. Lawrence JG, Ochman H (1997) Amelioration of bacterial genomes: rates of change and exchange. J. Mol. Evol. 44:383-397.