Cows appeared evenly spaced along the first principal axis with no clear gaps between observations

After centering and scaling cow attribute variables, linear fixed effects were added for cow age , calving date , and peak milk yield . Interaction effects were created for each combination of these linear terms, and a categorical effect added for the control and treatment groups of the fat supplementation trial. Models were generated for both the complete dataset and the subset of animals with no recorded health events, which consisted of 160 and 104 cows respectively after removing animals with incomplete attribute records. While UML insights served to improve the specification of model variance structures within animal, the validity of statistical insights made at the between-animal level is still contingent upon the correct estimation of model degrees of freedom. A fundamental assumption of frequentist tests is that observations must be independently sampled. When observations are not independent, the effective degrees of freedom present in the model may be lower than the nominal value. This causes the model to be overconfident in its estimation of error terms, increasing the risk of a false positive result. Non-independence due to repeated sampling has here been accounted for by fitting arandom effect for each cow, but non-independence between animals has not been accommodated. The results of the diffusion map and data mechanics visualizations did not recover overwhelming evidence of coordinated movements between animals through the queue, which would have signified non-independence due to social cohesion ; however, mobile racking we both visualized via data mechanics and know intuitively that in this physically constrained system any cow moving forward in the queue must be countered with other cows being forced backwards and vice versa.

If this effect extends beyond isolated fluctuations in daily formation of the queue, then the presence of some animals in the herd might systematically dampen or even completely prevent other animals from demonstrating behavioral patterns that they would otherwise display independently or in another herd with a different social composition . This would not only serve to confound the behavioral mechanisms at play, but such cows whose behaviors are suppressed by their herd mates cannot be said to be contributing fully to the model, potentially reducing the effective sample size. This could allow sampling fluctuations to produce misleading statistical inferences, even in this large sample of animals . UML algorithms cannot recover information about behaviors that were never expressed, and so are also not immune to the biasing effects of non-independence between animals. These tools can, however, provide model-free tests of association that may serve as a sanity check for statistical inferences when degrees of freedom may be uncertain. We explore this option here by again combining modern clustering tools with a flexible information theoretic approach to pattern detection . First, independent clustering tress were used to subdivide the herd based on queuing records and each of the cow attributed variable. The resulting categorical variables were then used to form contingency tables between queue subgroups and each of the candidate predictor variables. If no relationship existed between these two axes, then a cow belonging to a given row category based on queue records would be just as likely to belong to any of the column categories based on cow attribute and vice versa.

If instead an underlying biological mechanism was present linking these axes, then cows within a range of cow attribute values would be spread unevenly among queue subgroups. Such heterogeneity in cell counts was quantified by calculating a weighted mutual conditional entropy value across first the rows and then the columns of the contingency table and averaging the results, which reflected the amount of mutual information shared between the two variables. To determine if the observed MCE value was significantly smaller than would be expected from random fluctuations in the sample, row and column classifiers were randomly permuted across cows to remove any underlying bivariate relationship and MCE recalculated. This randomization procedure was repeated over 2000 iterations, and the observed entropy value compared to the resulting empirical CDF to produce a p-value for the significance of the bivariate association. Mutual conditional entropy tests were performed for all significant or marginally significant linear effects for both regression models. As this herd was also fitted with ear tag accelerometers, it is here also possible to explore relationships between queue position and behavioral patterns displayed between milkings. Due to the size of these datasets, however, this small step beyond the bounds of the existing literature constitutes a considerable leap in statistical complexity within a linear modeling framework. A multivariate mixed model that considers all observations from either dataset would exceed the capacity of many solvers . A simpler approach to exploring this relationship might therefore be to compress the information available in parlor entry records into a grouping variable and then attempt to identify differences in the various home pen behaviors across the resulting subsections of the herd.

We implement this strategy here by using the nlme package to fit linear mixed models, with cow fit as a random intercept, against each of the five behaviors recorded by the CowManager platform and also average body temperature . To avoid the risk of anomalous behaviors that might skew model inferences, only cows with no recorded health events were used. Hour of the day was fit as a categorical variable to capture cyclical patterns. Days on trial was also fit as a categorical fixed effect to allow for non-smooth longitudinal changes in behaviors due weather and also the shift to pasture. Finally, queue groups were determined by arbitrarily dividing the herd into quartiles based on median entry position. The resulting categorical variable was then fit as both a main effect and an interaction effect against both cyclic and longitudinal time effects. Due to the size of the model, temporal correlation and heterogeneous variance models both exceeded the capacity of this package to converge. Comparisons of the cyclic and longitudinal trends in behavioral patterns between queue groups were made using the plotting utility available in the emmeans package , with the complete results provided in the supplemental materials. While linear models provide an expedient means to statistically evaluate targeted experimental hypotheses, the more open-ended approach to knowledge discovery provided by UML algorithms may offer an advantage in exploratory data analysis problems such as this. We explore the utility of this alternative strategy here by again employing a mutual conditional entropy test to identify significant associations between these two behavioral axes . The flexibility of hierarchical clustering tools allows this technique to be directly extended from the previous section, which compared repeated measures of queue position against a univariate covariate, to accommodate both high dimensional datasets. For each parameter recorded by the CowManager platform, this model free test of association was performed on the complete sensor record, on subsets of the records corresponding to each of the three lounging periods , vertical grow system and finally on a subset of the records where observations from all three lounging periods had been aggregated. As in the previous section, the number of clusters used to discretize queue and sensor data were evaluated on a grid, here from tree depths 2-10. To characterize the divergent behavioral patterns across queue groups identified by significant tests of association, tube plots were created by plotting each within day subgroup median on a circular grid and then stacking rings to form a tube using the 3D plotting tools in the plotly package .Looking first at the entropy calculations for each segment of the queue visualized in Figure 1, it is clear that all parlor entry positions are not stochastically equivalent. The same animals are seen consistently at the very front and back of the queue, such that the resulting entropy values are far lower than would be seen with a purely random queueing process. Moving towards the middle of the queue, however, there is progressively less consistency in the animals present across milkings, such that the observed entropy values approach a random process. Looking next at the stochasticity demonstrated by each individual cow in Figure 2, we see there is again a clear gradient.

Cows with median entry quantiles at the front and rear of the herd again show far greater consistency in their entry positions. As their median quantile position moves towards the center of the herd they become more variable intheir entry positions over the observation window. This gradient is seen using both entropy and variance as estimators of stochasticity, but is more visually distinct using entropy estimates. While discretizing an intrinsically continuous parameter results in a loss of information, we see here that this sacrifice has excluded extraneous noise in the system to bring the underlying stochastic pattern into clearer resolution. This data thus highlights the potential upside of amending entropy estimates to the traditional cadre of summary statistics, particularly when working with outcome variables that are prone to extreme or anomalous values.In examining the results of the permutation tests, nearly all animals demonstrated significantly less stochasticity in their entry positions at the standard 0.05 significance level as compared with a completely randomized queueing process. Only 3 cows out of 114 overlapped with the empirical distribution of entropy estimates under a randomized queueing pattern, and only 1 cow overlapped when variance was used as the estimator of stochasticity. This suggests that nearly all animals in the herd might contribute some information about the underlying nonrandom patterns in queue formation to subsequent analyses; however, the amount of information they contribute may not be equal, as there is considerable heterogeneity between cows. Of greater concern, this heterogeneity is systematic, as there are no cows showing high consistency in entry quantile in the center of the queue. If this pattern is not driven by variability in the underlying predictors of queue position, but instead reflects either an underlying behavioral mechanism or something even more fundamental to this system such as the inherent domain constraint , this could lead to inaccurate statistical inferences. Toavoid such risks, these simple visualizations provide clear evidence that a nontrivial variance model should be incorporated into the model specification phase to accommodate the heterogeneous variance structures in this dataset. Finally, the insights gleaned from these entropy-based visualization techniques agree well with the prior literature. Previous studies have repeatedly determined milk order records to be significantly more consistent than would be expected from a random queuing process using an array of correlation and regression-based approaches . Fewer papers, however, have explored differences in the consistency of entry positions between animals. Gadbury observed that only a subset of his herd seemed to demonstrate clear preferences for parlor entry positions. Such preferences do not appear to have been constrained to the front or back of the queue, however, as Gadbury also reported animals with a preference for the middle of the queue. In a more recent analysis with large commercial herds, however, Beggs et al. reported a nearly identical parabolic relationship between mean entry quantile and variance. With clear and consistent evidence of nonrandom patterns having been recovered from this dataset, further investigation of the behavioral mechanisms that might give rise to such heterogeneity in milk order records was clearly warranted.Visual inspection of the scree plot produced from PCA analysis revealed only one significant dimension was recovered from the original 80-dimensional dataset. To visualize the resulting projections, the first two principal components were plotted . In two dimensions points also appeared randomly scattered with no clear clustering. Thus, the PCA results revealed no compelling visual evidence of social cohesion. As this was the only significant dimension, this may suggest that a linear model to predict variations in central moment would be a reasonable representation of this dataset. This feature of the dataset was not, however, self-evident in the geometric relationships between data points revealed by the PCA projection, and thus might have been overlooked without specification of color encoding by median entry quantile value a priori.Evaluation of eigenvalues returned by the diffusion map embedding identified five significant dimensions. The 3D visualizations of these axes in Figure 3:B and provided in supplemental materials revealed quite clearly the underlying linear geometry of this dataset. Color encodings showed that the relative positions of animals along this narrow geometric band were determined by median entry quantile, further reinforcing that central moment was the most defining feature of this dataset.

