The coffee genome provides insight into the convergent evolution of caffeine biosynthesis
(A) The principal caffeine biosynthetic pathway. Three methylation steps are necessary to produce caffeine from xanthosine, involving the successive action of three NMTs: xanthosine methyltransferase (XMT), theobromine synthase [7-methylxanthine methyltransferase (MXMT)], and caffeine synthase [3,7-dimethylxanthine methyltransferase (DXMT)]. SAM, S-adenosylmethionine; SAH, S-adenosylhomocysteine. (B) Evolutionary position of caffeine-producing plants with respect to other eudicots (phylogeny adapted fromwww.mobot.org/MOBOT/research/APweb/). (C) ML phylogeny of coffee, tea, and cacao NMTs. Bootstrap support values (percentages) from 1000 replicates are shown next to relevant clades. Branch lengths are proportional to expected numbers of nucleotide substitutions per site. Colors identify genes assignable to the genomic blocks denoted in (D). (D) (Left) A model summarizing the duplication history of coffee NMT genes, following the phylogeny in (C). Three distinct tandem gene arrays evolved in situ on chromosome 1 from nearby gene duplicates (bold squares). The red and green blocks, colored as in (C), translocated (to chromosome 9) or rearranged (to elsewhere on chromosome 1) from their ancestral locus (blue region), respectively. (Right) Gene orders on modern chromosomes. Translocation of the red block, containing the putative caffeine NMT metabolic cluster, left the phylogenetically derived CcDXMT gene behind. Similarly, CcNMT19is a derived gene within its own NMT clade that remained in place following movement of the green block. Numbers at branches indicate relative times since major duplication events or diversification times of the tandem arrays, calculated from approximately neutral synonymous substitution rates. (E) Expression profiles (reads per kilobase per million reads mapped) of known Coffea canephoraNMTs. The genes in the putative metabolic cluster (along with CcDXMT and CcMXMT) exhibit similar expression patterns, higher in perisperm than endosperm. Data are plotted as log2 values. DAP, days after pollination.