The oral microbiome is one of the most diverse microbial niches in the human body, including over 600 different microorganisms . It is in continual contact with the environment, and has been shown to be susceptible to many environmental effects. These environmental factors include tobacco use, romantic partners, and cohabitation. The microbes reside in sub-niches along the oral cavity including on the tongue, cheek, and teeth. The salivary microbiome has been shown to be representative of many the oral microbiome niches, which is thought to be due to the fact that microorganisms from the oral cavity surfaces shed into the saliva. Previous salivary microbiome studies have identified specific microbiota that are present in almost all individuals, referred to as the core microbiome. Saliva is also accessible, making it ideal for surveys of populations for microbiome studies. In this paper, we describe an unbiased approach to studying the effects of human genes on the oral microbiome with a two-step strategy. The first step utilizes twin information to establish heritable phenotypes related to the microbiome; and the second identifies DNA sequence variation associated with the identified highly heritable traits. From 16S rRNA sequence information, a large number of potential phenotypes can be explored with the twin studies to allow identification of the most heritable and therefore the phenotypes most likely to be mapped in the association study. A key strength of this approach lies in the independence of the data underlying the two steps reducing multiple testing and type 1 effects on the power to carry out the test for association. The ability to refine a phenotype prior to carrying out an association study can lead to greater likelihood of detecting specific SNPs that influence it. We show,plant benches with the largest oral microbiome twin study to date, that multiple phenotypes of the salivary microbiome are heritable.
Using these phenotypes, we identify promising host gene candidates in a genome wide association study of an separate sample that may play a role in establishing the oral microbiome.Samples from 1504 twins of whom 111 within-twin longitudinal samples with at least 3500 reads and DNA samples from 1481 unrelated individuals with at least 3000 reads produced 2664 and 2679 OTUs respectively. All samples were rarefied to 2500 sequences to retain as many samples as possible to improve power with little effect to results. To avoid analyses of OTUs that were the result of sequencing or PCR error, OTUs that were not present in at least 2 subjects and observed at least 10 times were removed, resulting in 895 OTUs in the twins and 931 OTUs in the unrelated individuals. One of the unrelated individuals was later removed during analysis due to cryptic relatedness leaving 1480 people in the unrelated sample.β-diversity was analyzed via Bray Curtis and UniFrac using QIIME and R. Analyses included 366 MZ pairs, 386 DZ pairs, and 37,832 unrelated pairs obtained by using age and DNA collection year matched non-cotwin pairs from the twin sets. P values were calculated similarly to as previously described. In short, the pair labels were permuted 10,000 times and the W test statistic collected from each permutation. The P value was then calculated by dividing the number of W test statistics greater than the observed W test statistic plus 1 by the number of permutations plus 1. Biplot analyses were used as implemented in QIIME . In experiments where cohabitation was required, only cotwins 18 and under and those over 18 who identified themselves as cohabitating were included, which removed 328 subjects from the total twin sample who were living separate from their cotwin. This population of 588 twins pairs is referred to as the “cohabitation sample.” Cohen’s D effect size for β-diversity measurements was calculated using the R package ‘effectsize’ .
Microbial traits included taxonomic groups, OTUs, α -diversity measurements, and principal coordinates from β-diversity measurements , collapsing all perfectly correlated traits. Microbial traits were then processed within each population separately: twin pairs, European unrelated , and Admixture American unrelated . Traits were transformed to z-scores and then categorized as either continuous or categorical . Shapiro Wilk test was performed use the R packaged ‘stats’. Categorical traits were then binned based upon z-score transformation on all non-zero values : zero counts, less than or equal to −1, greater than −1 and equal or less than 0, greater than 0 and less than or equal to 1, greater than 1. Some traits failed to categorize due to lack of variation, resulting in the final trait counts: twins , EUR unrelated , ADM unrelated . Only the continuous traits were used in the EUR and ADM populations so data is provided only for those traits. Descriptions of all traits can be found in Additional file 1: Tables S11–14.The MZ and DZ ICC values were calculated using the R package ‘irr’ and were compared using the Wilcoxon Signed Rank Sum test function in the R package ‘stats’. The ICC values were calculated for all taxonomic groups that were categorized to be treated as continuous traits . P value was calculated as similarly to as previously described in which the zygosity labels of the twin pairs were randomized 10,000 times and the ICC values then calculated. This analysis compared the overall distribution of the ICC values for the MZ twin pairs compared to the DZ twin pairs. Because the entire distribution was compared and not each taxa individually multiple testing correction was not needed. In addition the ICC values for the remaining 17 continuous traits were determined .Genome Complex Trait Analysis was performed on all traits categorized as continuous in both the twin and unrelated populations using the GCTA software. The GCTA analysis was performed on the cleaned imputed genotypes described above in the European sample . The following covariates were included in the model: age; sex; sequencing run ; year DNA was collected; saliva collection method for 16S sequencing; DNA collection method for host genotyping; and the first 10 PCs to control for population stratification.
GCTA estimates for the Admixture American sample were not reported due to the small sample size after the threshold of IBS estimates less than 0.025 were applied.Genome wide association study analyses were performed using the software EPACTS. The Q.EMMAX function was used, analyzing the dosage information for each variant. The GWAS analyses were performed in the ADM and EUR ancestry groups separately. For both analyses a kinship matrix and first 10 principal components were included to control for population stratification within each ancestry sample . In addition to controlling for population stratification the following covariates were included in the model: age; sex; sequencing run ; year DNA was collected; saliva collection method for 16S sequencing; DNA collection method for host genotyping; and tobacco use . The kinship matrix was created based upon all 22 autosomes using the kinship function in EPACTS. To rule out the possibility that stratification or computational method influenced results, three additional methods utilizing different programs and methods for controlling for population stratification were carried out. These were: EPACTS with only the kinship matrix made from all SNPs ; PLINK with the first 10 PCs ; and GCTA with the leave-one-out kinship matrix . For all methods the following covariates were included in the model: age; sex; sequencing run ; year DNA was collected; saliva collection method for 16S sequencing; and DNA collection method for host genotyping.We performed an analysis of 752 twin pairs from the Colorado Twin Registry to estimate host genetic and environmental contributions to salivary microbiome composition. The sample included 366 monozygoticpairs , 263 same sex,rolling bench and 123 opposite sex dizygotic pairs that ranged from 11 to 24 years of age. Taxonomic analyses using sequencing of variable region IV of the 16S rRNA amplicon prepared from the saliva of each twin was carried out using QIIME on high-quality Illumina MiSeq paired end reads as previously reported. We determined phyla abundances to be Firmicutes , Proteobacteria , Bacteriodites , Actinobacteria , and Fusobacteria from the 2664 operational taxonomic units found, which is consistent with the “core” salivary microbiome we and others have previously reported. All of our analyses included only OTUs that were present in at least 2 subjects and observed at least 10 times in total after rarefying at 2500 reads. This filtering yielded 895 OTUs that were considered for all subsequent experiments. Measurements comparing mean β-diversity among MZ, DZ and unrelated individuals allows for assessment of microbial population differences between groups. With either Bray-Curtis or Weighted UniFrac measures of β-diversity among MZ twin pairs were significantly more similar to each other than DZ twin pairs, and for all 3 β-diversity measurements MZ and DZ twin pairs were significantly more similar to each other than to unrelated individuals . This analysis was also carried out with abundant OTUs and all OTUs with very similar results .
Rarefaction at 2500 reads produced consistent results across all rarefactions , so for subsequent analyses, one rarefaction to 2500 reads is shown. We could detect no significant effect on any β-diversity measure due to sex when comparing same sex vs opposite sex dizygotic twin pairs perhaps because the sample size did provide enough power to differentiate sex effects from interindividual variation . In subsequent DZ analyses therefore, opposite sex pairs were included. The Colorado Twin Registry includes highly detailed phenotypic information that is invaluable inidentifying and controlling for environmental confounders that may play an important role. Living together is a covariate influencing microbial populations in humans. It is well-known that MZs tend to cohabitate longer than DZs and indeed our previous work has shown that shared environment influences the oral microbiome. Therefore, it was possible that the tendency of MZ cotwins to live together longer could be driving the observed heritability. To examine this potential confounder, we reanalyzed the data in Fig. 1a based on questionnaire data from the sample in which we restricted the analysis to only cohabitating pairs . While ideally we would have also analyzed only twin pairs living apart, our sample size did not permit it. As seen in Fig. 1b, MZs remained significantly more similar to each other than DZ twin pairs for the Bray-Curtis and Weighted UniFrac measurement, and was also observed in the abundant and unfiltered/ unrarefied OTU tables described above . We conclude that cohabitation does not play a significant role in the observed microbiome heritability. To quantify the differences between groups the Cohen’s D effect size was calculated for all β-diversity measurements for both the full sample and the sample limited to twin pairs who were cohabitating . Comparisons between the unrelated and twin pairs yielded medium to large effect sizes. All other comparisons were either small or negligible, the largest of which being between MZ and DZ pairs for Bray Curtis. To quantify the effect cohabitation had on β-diversity measurements the effect size between all twin pairs and just pairs living together were compared for all measurements yielding only negligible effect sizes consistent with a conclusion that cohabitation was not driving observed heritability. The stability of the oral microbiome over time in adults is reported to be remarkably high relative to that of other body sites. To confirm and extend this observation, we assessed the stability of the oral microbiome in longitudinal samples from our cohort for 111 individuals, 2–7 years apart . The mean β-diversity measurements between longitudinal samples were compared to the mean of unrelated individuals of different ages. For all three β-diversity measurements examined subjects were significantly more similar to themselves than were unrelated individuals . Intraclass correlation coefficients are useful for estimating heritability of individual observations within a group of related observations ; the higher the ICC values for MZ pairs compared to DZ pairs, the greater the heritability. As shown in Fig. 2, ICC values for essentially all abundant taxa are significantly greater in MZ than DZ pairs. No significant difference was observed between the same sex and opposite sex DZ pairs across the taxa analyzed. The set of taxa analyzed were those that were categorized as continuous . Significance was established with Wilcoxon Signed Rank tests strongly supporting the heritability of taxon abundance in this twin set. We also tested 4 different alpha diversity measures , the first 3 principal coordinates for three different β-diversity measurements and saw that most traits were consistent with the conclusion that MZ cotwins are more similar than DZ cotwins.