We ranked the continuous traits based on their heritability and performed a genome wide association of the top six with the Efficient and Parallelizable Association Container Toolbox. This would be expected to reduce the loss of power due to multiple testing of hundreds of phenotypes. The family Carnobacteriacea was excluded from the GWAS analyses since it was highly correlated with the genus Granulicatella and the latter has a more refined taxonomic resolution. It is well established that continuous traits afford greater power in both twin studies and in GWAS. Therefore, although some categorical phenotypes showed high twin heritability , for GWAS we only studied continuous traits. The analyses were all controlled for age, sex, and sequencing run among other covariates . Analysis was done independently with individuals from the two major different ancestry groups of the unrelated sample, European and Admixture. Due to the limited size of the admixture sample, only the European sample is discussed and the admixture was only considered for the meta-GWAS discussed below. To control for population stratification a kinship matrix created from all the chromosomes and the first ten principal components from the LD-pruned SNPs were included as covariates . To control for the fact that 6 traits were tested, the genome wide significance level was lowered to 8.33e-09 . Using this threshold, we found that the genus Granulicatella was significantly associated with the SNP chr7:110,659,581 within an intron of the IMMP2L gene on chromosome 7. This gene is known to be involved in mitochondrial protein trafficking. The regional Manhattan Plots in Fig. 4b show that the peak locus includes SNPs of decreasing r2 values around the peak SNP lending greater confidence to the association. Without a replication sample this result is provisional but potentially interesting. Using PLINK 1.9, which takes categorical imputed genotypes rather than the probabilistic dosage calls produced by imputation as input,hydroponic tables canada produced results consistent with this association showing the association is independent of underlying computational method.
A comparison of the 100 SNPs with lowest P values in each of the six phenotypes examined in the European sample revealed that 7 SNPs were held in common between at least two of the phenotypes. Bray Curtis PCo2, Unweighted UniFrac PCo2, and Weighted PCo2, all β- diversity measures, were most often shared . After the initial analyses of the 6 most heritable traits, a GWAS was completed in the remaining 64 continuous traits in the European sample. We have used a relatively conservative approach to controlling for population stratification . To evaluate if this may have produced false negatives, we repeated the GWAS with EPACTS kinship only, PLINK 10 PCs, and GCTA LOCO . Each consistently identified the same SNP at chr7:110,659,581 significantly associated with the trait along with nearby SNPs in high LD associated as well . No additional significant SNPs were identified consistent with the hypothesis that stratification methodology had little effect on identifying the top SNPs and that we were not “over filtering” with rigorous kinship controls. For completeness, we then carried out a GWAS analyses for the remaining 64 continuous microbial phenotypes using the EPACTS kinship only analyses adjusting significance for the additional multiple testing and found no SNPs to be significantly associated. This is perhaps not surprising given the relatively small sample size .The size of the ADM sample made it unlikely to produce statistically significant results. To glean useful information from it we combined it with the EUR data described using a meta-analysis approach that can effectively deal with population issues inherent in mixing samples of different populations. METAL is such a meta-analysis package that takes as input individual SNP P values and the direction of their effects weighted by the sample size to arrive at composite P values. The test statistics were also corrected for population stratification . The METAL analysis identified the same suggestive significant SNP on chromosome 7 that was associated with Granulicatella abundance in the EUR GWAS . However, due to the small size of the ADM sample, this SNP did not survive quality filtering in the METAL analysis and so was not a factor in the METAL analysis outcome. Analyses of Unweighted Principal Coordinate 3 yielded a SNP on chromosome 12 that reached genome wide significance in the same direction for the combined sample, though it was not robust to multiple testing correction .
Again, the regional Manhattan Plots in Fig. 5c show the peak locus includes SNPs of decreasing r2 around the peak SNP consistent with the association. The minor allele C, was shown to be consistent with lower PCo3 z-scored values . The most promising single SNP association occurred with the phenotype defined as the abundance of the genus Granulicatella. We reanalyzed the association data with the gene-based tool Knowledge-based mining system for Genome-Wide Genetic studies involved in protein processing associated with mitochondrial import and a non-coding antisense RNA INHBA- AS1 . A SNP in INHBA-AS1 had been previously identified in a dental caries GWAS along with a loci in the INHBA gene. INHBA is thought to be important to tooth development, which could have potential interesting implications to the oral micro-biome. The meta-GWAS results on the PCo3 of Unweighted UniFrac most highly associated region was the gene LIN7A on chromosome.It was possible that tobacco or other factors influenced our observation of genetic association. For example, Streptococcus abundance, a highly heritable phenotype, has also been shown to change in smokers. In addition other substances could potentially change the oral micro-biome. Among these alcohol and marijuana, though these effects have yet to be determined. However, marijuana use is correlated with poor oral health, which is often indicative of changes in the oral microbiota. We had available the self-reported tobacco, alcohol and marijuana use in 92% of our subjects for the previous six months. We therefore repeated the analyses using the three substances as covariates . As seen in Additional file 2: Figures. S15 and S16, controlling for tobacco/alcohol/marijuana use had negligible impact on the top hit on chromosome 7 for the genus Granulicatella . For the 6 highly heritable continuous traits that were analyzed, both with and without substance use covariates, results appear to be consistent with and without substance .We have shown that microbe abundance and some aspects of the microbial population structure are influenced by heritable traits in saliva.
We have ranked the “most heritable” traits using ACE/ADE modeling and GCTA-based SNP heritability and carried out an unbiased GWAS on the 6 most heritable traits. One SNP on chromosome 7 in the gene IMMPL2 reached genome-wide significance. Another gene IINHBA-AS1 on chromosome 7 achieved genome-wide significance when analyzed by KGG4 that relies on a composite association score including all SNPs in each known gene. The significance of these associations was not influenced by “p-hacking” statistical biases common in GWAS because phenotype choice was not based on previous association tests. This approach is a model for using heritability to reduce the multiple testing problems seen in many GWAS reports and it could be the method of choice in the design of GWAS studies in which sample size may be limited. Bray-Curtis, Weighted UniFrac,microgreen rack for sale and to a lesser extent Unweighted UniFrac β-diversity demonstrate that many components of the micro-biome community are heritable . While a shared environment and behavioral habits contribute to a more similar micro-biome , such studies did not control well for the clear genetic influences in their populations. When we examined the differences among MZ and DZ cotwins and age-matched unrelated individuals that we were confident cohabitated , the genetic influences remain clear. It is significant that the genetic effects are detected using measures that include all detectable OTUs. To assess heritable influences of individual microbial components, we carried out intraclass correlation analyses that show that heritability extends across nearly all observed taxa individually . The one exception is in the fusobacteria where ICC does not distinguish MZ and DZ. Possibly these organisms, known to be “bridges” between early and late colonizers on gum and tooth surfaces, may not have interaction with host proteins and could lack human genetic influences. GWAS of complex traits on relatively small samples is problematic due to the lack of statistical power. The influence of individual genes on traits that have multiple genetic components may be small. Moreover, the micro-biome is a highly complex population with interacting networks of bacteria that all may have multiple interactions with the host. A variety of covarying network modeling approaches have demonstrated how complex these communities are. It has been shown that assuming the number of causal variants and their frequency spectra for a pair of traits are similar, more heritable traits are more likely to be detectable in GWAS. Therefore we focused on those micro-biome endophenotypes with greatest additive genetic heritability for GWAS. Both ACE/ADE modeling and GCTA SNP heritability are suited to this approach. The microbial phenotypes with greatest additive genetic influence in the ACE/ADE model on the entire twin cohort were the abundance of the OTU4483015 that corresponds to an unnamed species of Granulicatella and PCo2 of Bray Curtis . The influence of additive genetics was variable depending on the trait when comparing the full sample to heritability only among cotwins that cohabitate . The variation in estimates may reflect environmental effects or loss of power between the full sample and the cohabitating sample . This again points to the complex nature of the microbe-host interactions in primarily aerobic and anaerobic environments and how human genetic influences must also be complex. As a further test of heritability prior to GWAS, we examined SNP-based heritability in our unrelated sample with GCTA. A positive correlation was observed between the ACE/ADE and GCTA ‘heritability’ estimates for continuous traits in both the full twin sample and the EUR sample . Previous studies have demonstrated that large samples are needed to produce results reaching statistical significance using GCTA.
In their original paper Yang et al. showed that while increasing the sample size does decrease the error bars of the heritability estimates, the heritability estimates themselves remain relatively stable. While the GCTA estimate was not significant upon correction for multiple testing, the positive correlation between the unrelated individuals and the twin studies provides support for the conclusion that for these continuous traits genetic variation influences microbial populations. A GWAS analysis with the six most heritable continuous traits determined from the twin modeling was carried out in the European populations . The GWAS of the abundance of the genus Granulicatella identified a genome wide significant SNP on chr7 . This SNP is located in an intron of the IMMP2L gene. The GWAS meta-analyses combining the EUR and ADM samples using METAL with the same 6 traits showed no new information about the chr7 SNP due to its low frequency in the ADM population but did produce an additional association with suggestive significance, chr12:82,166,911 for the phenotype Unweighted UniFrac PCo3, though it was not robust to correction for multiple testing. This SNP is located in the gene LIN7A that is widely expressed in endothelial cells. Markers in LD with the top SNPs were also highly associated with the phenotype, but in addition, markers of somewhat lower LD that were nearby also displayed elevated significance for both hits. This provides an argument that these loci may not be due purely to chance . To be adequately powered one must have a large sample size or the single SNP effect must be very large. However, most complex traits are polygenetic and so many loci with small effects account for the variation of the trait. Therefore, where sample size is limited, it may be difficult to observe significant SNP associations. To address this, it is possible to use biological information to inform analyses and increase statistical power. This may be done by aggregating the association of multiple SNPs known to be present within a known gene. By this approach, the possibly small effects of all SNPs in the gene are combined and then the association of the entire gene may be determined. Even if no single SNP is found to be genome-wide significant the combined SNP contributions across the gene may be. One widely used gene-based GWAS analysis method is the Knowledge-based mining system for Genome-wide Genetic Studies.