dc.description.abstract | The genome of the allotetraploid species Coffea arabica L. was sequenced to assemble independently
the two component subgenomes (putatively deriving from C. canephora and C. eugenioides) and
to perform a genome-wide analysis of the genetic diversity in cultivated coffee germplasm and in
wild populations growing in the center of origin of the species. We assembled a total length of 1.536
Gbp, 444 Mb and 527 Mb of which were assigned to the canephora and eugenioides subgenomes,
respectively, and predicted 46,562 gene models, 21,254 and 22,888 of which were assigned to the
canephora and to the eugeniodes subgenome, respectively. Through a genome-wide SNP genotyping
of 736 C. arabica accessions, we analyzed the genetic diversity in the species and its relationship with
geographic distribution and historical records. We observed a weak population structure due to lowfrequency
derived alleles and highly negative values of Taijma’s D, suggesting a recent and severe
bottleneck, most likely resulting from a single event of polyploidization, not only for the cultivated
germplasm but also for the entire species. This conclusion is strongly supported by forward simulations
of mutation accumulation. However, PCA revealed a cline of genetic diversity reflecting a west-toeast
geographical distribution from the center of origin in East Africa to the Arabian Peninsula. The
extremely low levels of variation observed in the species, as a consequence of the polyploidization
event, make the exploitation of diversity within the species for breeding purposes less interesting
than in most crop species and stress the need for introgression of new variability from the diploid
progenitors. | |