If instead some cows were less consistent in their time budgets across days, then the sampling error imposed by the subsampling routine would be greater, resulting in a larger ensemble variance estimate. Thus, we would expect a stronger penalty to be applied to cows whose demonstrated greater plasticity in their behavioral response to both transient and persistent changes in the production environment. For small datasets with limited number of replications, the number of subsamples could be set quite close to the size of the complete sample, and would thus emulate a jackknife approach to variance estimation . For larger datasets, however, the subsample size could be set smaller to make the resulting ensemble variance estimates progressively more sensitive to the uncertainty in the underlying behavioral signal. To evaluate the empirical performance of these dissimilarity estimators, distance matrices were calculated for the 177 cows with complete CowManager time budget records. Euclidean distance and KL Distance were calculated using base R utilities, with speed up options utilizing the Rfast package . An ensemble-weighted dissimilarity matrix was first calculated using simulated values accounting only for measurement error using the most conservative joint Dirichlet-Multinomial sampling scheme, hereafter referred to as Noise-Penalized Distance. A second ensemble-weighted dissimilarity matrix was then calculated using the same sampling scheme for measurement noise but aggregated over a 14-day subsample to account for behavioral plasticity in daily time budgets,hereafter referred to as Plasticity-Penalized Distance.
The LIT package provides users a clustering visualization utility, weed trimming trays which converts dissimilarity matrices into a dendrogram using the hclust utility in base R with default Ward D2 linkage and the generates heatmap visualizations of the resulting clustering results using the pheatmap package . HHaWPaSV ZHUH JHQHUaWHG RQ a JULG RI cOXVWHU YaOXHV IURP N = 1«10 IRU HacK RI WKH IRXU dissimilarity estimators, with complete results provided in Supplemental materials, and the results for k = 10 clusters compared in Figure 2. The LIT package also provides users a plotting utility to visually contrast the broader patterns between behavioral encodings. Outputs from the clustering utility are passed in to create a contingency matrix generated using ggplot2 with cells colored by their corresponding cell count . The heatmap visualizations for each encoding are then added to the row and column margins of the contingency matrix using the ggpubr package , and arranged such that each row cluster in either heatmap matches the order of the contingency matrix reading either up-down or left-to-right, allowing for direct and detailed visual comparison of the discretized behavioral patterns. Comparisons between the Noise-Penalized and Plasticity-Penalized encodings are provided in Figure 3.An optimal encoding strategy seeks to minimize the loss of relevant information by retaining as much of the underlying deterministic signal as possible while hemorrhaging only noise . In a hierarchical clustering framework, this is achieved by pruning thedendrogram built from the dissimilarity matrix where the branches cease to represent differences in the underlying signal. Standard pruning strategies typically either 1) to allow the user to provide a dissimilarity cutoff, below which value all further branches are grouped into the same bin, or 2) allow users to extract the first K branches of the tree .
As with the default Euclidean distance dissimilarity estimator, this approach may be appropriate for datasets with relatively homogenous variance structures. For data drawn from intrinsically heterogenous distributions, however, the branch lengths cannot be directly compared across the domain of support, making globally-defined pruning rules a suboptimal strategy for analysis of time budget data. More fundamentally, a homogenous pruning strategy may be too simplistic for many PLF sensor datasets, for which the underlying signal often represents a complex composite of behavioral mechanisms that operate at multiple scales. While some environmental factors might be expected to have an impact on cattle behaviors that is fairly uniform across the herd, other factors might elicit responses that differ in magnitude for different subgroups within the larger populations, or even become isolated within smaller social cliques. For example, we might expect the number of times cows are moved each day for milking will place similar constraints on the time left to lie down across all animals, but overstocking with respect to stall spaces might have a much larger magnitude of impact on the lying patterns of subordinate heifers than the more dominant older cows . In such a complex system, we would expect the heterogeneity imposed by the underlying biological signal to differ in scale across the dataset. Subsequently, in attempting to employ a global cutoff decision to encodeinformation for such a dataset, we would always be faced with the difficult decision to either ignore the subtler behavioral patterns present in some branches of the tree, or else allow noise to contaminate our encoding of other branches with intrinsically coarser behavioral patterns.
While all the components that contribute to the signal in a complex livestock system might be difficult to anticipate a priori, we propose that a more dynamic pruning algorithm might still be achieved by again employing flexible simulation-based approaches to emulate the comparably simpler sources of uncertainty. If each branch of the dendrogram is viewed as a pairwise contrast between two groups of animals, then we need only to determine whether the bifurcation under inspection represents a difference in the underlying signal can be reliably distinguished from noise. By implementing such a branch-level test recursively, we can gradually work our way down the tree with locally-defined pruning decisions. To evaluate the reliability of the behavioral signal encoded at each bifurcation of the tree , our branch test utility utilizes two mimicries. The first set of simulations are generated under the alternative hypothesis that assumes a branch contains an underlying deterministic signal that is only partially obscured by stochastic noise. Here then we can simply repurpose the ensemble of simulated data sets used previously to calculate the ensemble weighted dissimilarity metrics by mimicking the uncertainty in the observed data. As the null implies that animals demonstrate equivalent patterns of behavior within the resolution of the sample, this mimicry can be generated quite efficiently using a standard bootstrapping routine , wherein time budgets simulated under the alternative are unconditionally resampled from amongst all animals in a given branch. HClustering is then performed independently on each data mimicry in either ensemble and the first k branches are extracted to create an ensemble of discrete encodings. Under the alternative hypothesis, a strong signal should produce a robust tree structure such that, even after the addition of simulated noise, the resulting encoding would still closely mirror that of the original observed data. As the stochastic component of a dataset becomes stronger relative to the signal, these bifurcation points will become progressively less stable and the subsequent encodings less reliably aligned with the original data. When the signal falls below the resolution of the data, the tree structures of the simulated data would then seldom match that of the original data, and so would become poorly distinguished from encodings generated under the null with no signal component. We propose that mutual information, which can be calculated without any additional distributional assumptions, can be used to quantify the similarity between of the observed data and each mimicked dataset, and subsequently used to determine if simulations under the alternative are distinguishable . Equipped with an appropriate encoding to discretely represent the heterogeneity in overall time budgets within this herd, trimming trays for weed and provided the encoding of longitudinal patterns in parlor entry position from previous work with this data set, a potential question to ask would be: There are a number of nonparametric and parametric techniques available to evaluate the overall strength of association between two discrete variables by evaluating the distribution of animals in the joint encoding .
There is, however, perhaps greater practical utility in characterizing low and high points within the joint encodings, which would provide more detailed insights into the tradeoffs between specific behavioral patterns recovered from the data streams in these distinct farm contexts. Towards this end, information theory offers a more comprehensive approach to decomposing the stochasticity within discretely encoded variables, and thus may provide a more holistic approach to evaluating both the global and local features of a joint encoding, while employing few structural assumptions . First, to evaluate the strength of overall relationship between two discretized behavioral responses, the LIT package provides users a permutation-based bivariate testing utility thatuses the mutual information estimator to quantify the amount information entropy that is redundant between the two encodings . We can anticipate, however, that the efficacy of this test in recovering significant relationships between the underlying biological signals will be affected by the resolutions of the encodings. Suppose that a single latent biological factor impacts the behavioral responses collected by both PLF data streams, creating informational redundancy between the two encodings. If we cut the trees above the intrinsic magnitude of its impact on a given behavior, its influence may be overlooked and mutual information under-estimated. On the other hand, if we prune the tree far below the magnitude of its impact, our inferences can lose power as bins sizes in the joint encoding become progressively smaller, weakening the empirical estimation of the joint probability distribution and thereby increasing estimation error in the MI estimator. The resolution of our encodings must, therefore, be optimized to match the dynamics of the system, or a false negative result may be returned. To further complicate matters, however, we cannot necessarily assume that the magnitude of impact of a given latent factor will be uniform across behaviors, nor should we expect in a complex farm environment that behaviors will be influenced by a single latent factor. To overcome this logistical challenge without falling back on dubious a priori assumptions, the LIT package implements mutual information-based permutation tests on a grid, varying the cluster resolutions across both behavioral axes . Under the null hypothesis that no significant bivariate relationship exists between data streams, cow ID labels are randomly permuted within each tree, preserving the marginal distribution of the data along each axis but destroying any latent bivariate relationships. These permuted trees are then cut and the mutual information of the joint encoding estimated for each combination of cluster counts on the grid. A p-value is then generated by comparing the observed MI value of the joint encoding at each grid point against the corresponding distribution of MI values simulated under the null. Just as a scientist varies the focus of a microscope to bring microbes of different size into resolution, we can expect that geometric features of the joint probability distribution imposed by latent deterministic variables that vary in scale of impact will come into and fall out resolution as these meta-parameters are varied across the grid of cluster counts. To help the user visually identify where such features have come into resolution, the LIT package also returns a heatmap visualization of the observed MI value for each grid point that is centered and scaled relative to the distribution of MI values under the null. For behavioral measurements subject to the influence of multiple biological and environmental factors operating simultaneously, this exhaustive approach to parameterization enables users not only to build a more complete picture of a complex behavioral system, but may also provide insights into the hierarchy of these behavioral responses. Unfortunately, as the resolution of the encodings is increased, MI estimates not only become less precise, but they may also become less accurate. Bias is introduced when empirical estimates of the joint probability distribution become so granular that regions with low but nonzero probabilities go unsampled. These zero-count bins cause the total entropy calculated from the empirical joint probability distribution to be under-estimated, which in turn causes the relative amount of redundant information to be over-estimated. While the magnitude of this bias is partially dependent on the total sample size, it is also contingent on the structure of the joint probability distribution itself, namely the number of low-probability cells. Given that the joint probability distribution under the null, which is randomly permuted to intentionally remove any nonrandom features in the sample, can be expected to have a more uniform distribution of probability than the observed dataset, we can anticipate that the magnitude of the bias may differ between these two distributions as the sample becomes more granular, preventing MI estimates from being directly comparable.