Twenty-seven weedy rice plants were chosen for genotyping along with 12 accessions of temperate japonica varieties that are cultivated in California. Leaf tissue from the outdoor grown plants was excised and desiccated for shipment to Clemson University for DNA extraction. DNA was extracted from desiccated leaf tissue using the Macherey-Nagel NucleoSpin 96 Plant DNA extraction kit . Purified genomic DNA was diluted 2:1 in nuclease-free water for polymerase chain reactions . PCR was carried out using standard conditions to amplify 48 gene fragments selected by [21] from 111 sequenced tagged sites developed by [38]. PCR products were checked by gel electrophoresis and cleaned up using Exonuclease and Antarctic phosphatase treatment following the method described in [39]. Direct sequencing in both the forward and reverse directions was carried out by the Clemson University Genomics and Computational Biology Laboratory. Sequences were assembled into contiguously aligned sequence ‘contigs’ and assigned quality scores using Phred and Phrap. Contigs were aligned and inspected visually for quality and heterozygous sites in BioLign version 4.0.6.2 . Heterozygous base calls were randomly assigned to two pseudo-haplotypes, which were then phased using PHASE version 2.1. Due to low levels of heterozygosity in the data set, haplotypes were inferred with very high probabilities and consistency across five runs. All sequences have been submitted to NCBI GenBank . Phased haplotypes were aligned with sequences obtained from [21]. These additional sequences consist of the same 48 STS loci for a broad range of AA genome Oryza species including 58 weedy rice accessions sampled over a 30 year period from Arkansas, Louisiana, Mississippi, Missouri, and Texas. Also included in this dataset are sequences from the major cultivated groups from Asia and Africa , vertical racking as well as wild species sampled from Asia , Africa , Central America , and Australia.
Genetic structure and divergence. Summary statistics for each STS locusincluding nucleotide diversity at silent sites using the Juke’s Cantor correction, Watterson’s θ at silent sites, number of segregating sites S, and Tajima’s D were calculated in DnaSP version 5.0 . Arlequin version 3.5 was used to calculate pairwise FST and ФST estimates with 10,000 permutations to assess significance.Bonferroni corrections were used to determine Pvalue cutoffs. Recombination break points in each locus were determined using the four gamete test in SITEs. The population-mutation parameter FST is an estimate of genetic divergence within and between groups and was used to test for the extent of genetic differentiation. To better estimate divergence betweenCaliforniaweedy rice and other rice groups, the population mutation parameter ФST was used, which is similar to FST but uses distances between haplotypes, not just haplotype frequencies. Genetic diversity was measured by computing the average nucleotide diversity , total number of segregating sites, and Watterson’s θw within each field as well as within all fields combined. Population structure was inferred using InStruct, which was designed to allow for inbreeding by not assuming Hardy-Weinberg equilibrium within populations. Using STRUCTURE for inbreeding populations results in inappropriately higher rates of inferred splitting between populations . Five permutations for each number of populations were set from 1 to 22 with 500,000 steps and a burn-in period of 100,000 steps. In Structruns were completed on the Clemson University Condor computing cluster. Log likelihoods for each run were compared to determine the best fit K value. Distruct version 1.1 was used to create the graphical display from the results obtained with InStruct. Isolation with Migration modeling was used to test for best fit models of isolation-migration and simultaneously estimate effective population sizes , migration between populations , ancestral population size and time since divergence .
California weedy rice was compared on a pairwise basis to California cultivated rice , strawhull weedy rice, blackhull weedy rice, O. rufipogon and O. nivara. Recombination was only detected in O. rufipogon, so the longest nonrecombining blocks were only utilized in the comparisons including O. rufipogon. Each comparison was run in M-mode with wide value cutoffs for all parameters to determine where posterior probability distributions ranged. After the initial run, three runs were conducted with different random number seeds and smaller cutoff values that were based on the distribution of parameter values from the first run.All runs had 100,000,000 MCMC steps after a burn-in of 100,000 steps. Each run had 10 chains with a mixing rate of five chain swaps per step. All three M-mode outputs were checked for convergence and L-mode runs were conducted on the tree files to test nested models. The maximum likelihood estimates were scaled into demographic values based on a mutation rate of 1 × 10−8 and a generation time of one year, as done with previous work, based on. All IMa runs were computed on the Condor cluster at Clemson University using primarily an extensive web-enabled system to simultaneously manage and monitor performances of each set of input priors. Use of a cluster allowed for more than 28 simultaneous runs, where priors could be checked and adjusted as needed. Multivariate analysis of trait variance. The goal of these phenotypic analyses was to elucidate genotype-phenotype relationships between California Oryza cultivar and weedy riceecotypes. Thus, we determined the most influential phenotypic footprints of rapid divergence in domestic and wild-like traits of rice and its conspecific weed within the California floristic province. To characterize trait variability and by extension morphological relationships among domesticated and weedy rice ecotypesin infested fields, we quantified phenotypic diversity by first describing the variance partitioning of weedy populations and comparing the adaptive traits which characterized weedy rice to those that defined cultivars. To more accurately characterize dimensionality in weedy or feral rice morphology, a subset of unique gourmet varieties were added to the medium-grain cultivars in the rice dataset for the phenotypic diversity analyses.
Qualitative descriptors were transformed using the PRINQUAL procedure of SAS with the OPSCORE option for optimal scoring and MONOTONE option for monotonic preservation of order. Principal Components Analysis with maximum total variance was performed on the combined quantitative and transformed qualitative descriptors. The variables describing cultivars were reduced by eliminating any that did not vary by descriptive statistics and then using both random and a priori sampling to preserve group partitioning while identifying the eigenvectors which most clearly separated groups. UPGMA hierarchical clustering using the CLUSTER procedure of SAS was performed to confirm separation of clusters on PCs and to generate a dendrogram using average Euclidean distances. Qualitative transformations, PCAs, and MANOVAs were executed in SAS1 Version 9.3 . Average estimates of genetic differentiation between weedy rice in California rice fields are very low, ranging from 0 to 0.0026 . There are no significant differences in FST estimates for any of the 48 loci. The highest FST estimate was 0.077, betweenCRR1 and CRR4 at STS085. These low values indicate no population structure and no divergence of weedy rice in the fields sampled, which supports the appropriateness of a genetic diversity assessment for California weedy rice . Measures of genetic diversity for California weedy rice within each field as well as for weedy rice within all fields combined are also very low , rolling benches consistent with a recent founder event, or strong population bottleneck. These values are a full order of magnitude lower than what was calculated for strawhull and blackhull weedy rice ecotypes collected from the southern US. Due to the lack of population substructure and low genetic diversity, we placed all California weedy rice into one group for the remaining analyses. Values for average population differentiation estimates across all 48 loci indicate high divergence between California weedy rice and all other sampled groups. The lowest mean value is with O. rufipogon collected from Southeast Asia . Taking the median values across the 48 loci allows better understanding of the patterns across all loci. The lowest divergence was between California SHA weedy rice and BHA and SH; median ФST values indicated that for at least half of the loci tested divergence was an order of magnitude lower than the mean estimates . This indicates that the mean ФST is high due to divergence at a few loci, and that California weedy rice does share some similarity to weedy rice from the southern US at several loci.The most recent divergence of California weedy rice from other Oryzasis from California rice cultivars , which was estimated at about 118 generations ago . The other divergence estimates were over an order of magnitude older . Interestingly, both SH and BHA weedy rice from the southern US have very old divergence estimates: approximately 30,000 and 17,000 , respectively.
These numbers are likely inflated compared to the reported origin of domesticated rice approximately 10,000 generations ago due to interactions among other closely related genotypes. This follows work examining model testing performance of IMa under several scenarios,which showed that divergence estimates inflate when gene flow from other populations is included in the model. A model of relative divergence times shows a shallow, recent coalescence of California weedy rice and California crop rice alleles, whereas SH and BHA southern US weedy rice and Chinese O. rufipogon show a much older divergence from California weedy rice . Migration estimates between California weedy rice and all other groups were quite low, with higher estimates of migration into California weedy rice in all cases. Indeed, these data should not be interpreted as absolute numbers but instead as relative values. Any overestimation of the generation values could otherwise indicate that the divergence actually happened even more recently. The effective population size for California weedy rice is very small in all cases,supporting a recent founder event or bottleneck.California weedy rice differs morphologically from other southern US weedy rice ecotypes. California weedy rice has a straw-colored hull with long awns , whereas only 7%of SH weedy rice in the southern US has awns. Nevertheless, California weedy rice shares important weedy traits with those of southern US weedy rice including high seed shattering and a red-colored pericarp in addition to tall stature and high tillering habit.Principal components analysis reduced the set of observed variables for California weedy and cultivated rice by loading them on orthogonal lines of fit based on contributions tovariance. No variation was observed amongst weedy and cultivated rice for leaf texture and angle, ligule shape, ligule color, ligule pubescence, auricle color, node color, or panicle secondary branching, so these traits were excluded from the analysis. When multiple traits represented the same metric, we chose the variable with the highest eigenvector value to represent each group or suite of highly correlated traits, although each group member or variable has an impact when describing the underlying mechanism responsible for phenotypic selection differences between the cultivar and weedy rice in California. Importantly, pericarp color clearly distinguishes weedy from cultivated rice in California , but is not highlighted in the dimension-reducing PCA because it was scored as a qualitative trait following International Rice Research Institute descriptor guidelines. The remaining phenotypic traits included vegetative growth habit characters, reproductive morphologies, and yield metrics related to grain morphology. Principal components analysis was conducted on these informative traits, excluding highly correlated variables . A second PCA was performed on a reduced dataset, which included the five traits with highest eigenvector values for each principal component in the initial analysis . Principal Components 1 , 2 , and 3 together account for 45.18% of the total cumulative variance in cultivated and weedy rice in the first PCA . Traits most greatly discriminating California weedy from cultivated rice include panicle type, leaf width ,flowering , awn color, and culm length ; lemma pubescence, texture of the panicle axis, length of the first leaf below the flag leaf, length/width ratio of grain, and width of flag leaf ; 100-grain weight of the field-collected mother plant, awn length of the field-collected and offspring plants, spikelet fertility , and grains per panicle described most of the variation along PC3. Because we were interested in the contribution of suites of traits describing each statistically significant orthogonal vector, the 15 traits contributing most to variance along the first three PCs in PCA 1 were subjected to a second analysis. In PCA 2 of “key discriminating traits,” principal components 1 , 2 , 3 , and 4 account for 74.91% of the cumulative variance in the phenotype of California weedy rice . Since components or dimensions with an eigenvalue greater than one are statistically relevant to the result , we report four principal components for this second PCA.