Using the USDA Quick Stats tool to find census data on Milk – Operations with Sales for California returns the years 1997, 2002, 2007, 2012 and 2017 and comes with the county name and the number of operations.All available years of the county crop reports were downloaded from the USDA NASS site as .CSV files and compiled using R Studio for processing and analysis. A total number of 442 crop types were reported. After merging the 41 years into a single data frame, I filtered the dataset to include only milk related commodities, of which there were three possible types: Milk Market Fluid, which refers to Grade A beverage milk ; Milk Manufacturing, which refers to milk that is used to make butter, cheeses or milk powder; and Milk Cow’s Unspecified, which simply means that the county did not distinguish between fluid and manufacturing milks. While they are reported differently depending on the county, these three types of milk are the same in terms of form and units . Milk is measured in hundredweights, notated as Cwt, which is equal to 100 pounds or11.63 gallons of fluid milk. For the purpose of my analysis, I summed the different production values for the three categories of milk to allow for comparison across all counties. Lastly, I pivoted the table to make the data compatible with a shape file for mapping, so there was one row per county with a column for each year of production, 1980-2020. This pivot removes any associated data such as price per unit or value, which can be addressed in a separate data frame or re-joined if desired but was not included in this analysis. At this point, any holes in the data were identified where the original county reports were missing production quantities for certain counties and years. This may be due to errors in the compiled report, or that the county only reported the total monetary value of the commodity, rolling benches without reporting the hundredweights of milk produced. Table 2 summarizes the counties and years that were missing data, and what steps I took to fill in the data. I first searched directly in the County crop reports online and filled in blank data in an excel sheet.
If the crop reports did not report milk production quantities that year, I used a simple average formula to interpolate the data from the previous and following year so as to not leave any blank cells, which would appear the same as zeros on the maps. For a few counties, the reports from 2018 onward were missing, so I extrapolated numbers from the most recent year. In the end, 41 out of 58 of California counties remained in the dataset, meaning 41 counties reported milk production at least once between 1980 and 2020. Of these 41 counties, 22 of them had milk production reported for the full time-period , while 18 counties stopped reporting milk production at some point between 1981-2014. California has 58 counties; the smallest is San Francisco County and the largest is San Bernardino. These counties are grouped into eight agricultural districts by the CA County Agriculture Commissioner’s Data Listing. The initial analysis is presented at the county level for all the Census years and decades, after which the data is aggregated into regions, as shown in Figure 3, to show the broader patterns. Based on the history of California’s dairy industry, and the current milk production rates, I distinguish Marin and Sonoma from the Central Coast counties as its own agricultural region in the context of dairy production, creating nine regions.This research involves two sets of spatial-temporal data from 58 counties spanning 41 years. When visualizing changes over space and time, a static map or time-based chart will inevitably sacrifice nuances, obscuring changes through the years or spatial relationships between counties. I developed a methodology to map changes over time using a sequence of maps designed to be viewed in succession, either flipping through full pages or as an animation in a GIF file. The result is a unique visual of the data that captures the spatial relationship while maintaining the temporal resolution. To provide further detail, the maps are supplemented with data tables summarized by decade and with basic calculations like percent share and percent change to quantify the effect that the maps give visually.
To visualize the milk production and dairy operations data I had acquired, I created bubble maps, or maps using proportional symbols, to address the problem of using county-level data in symbolizing quantities. A map with proportional symbols uses point sizes to represent the number of farms per county, or volume of milk produced. This is an alternative to maps where a color gradient fills the shape of the county, known as a choropleth map. While there are benefits and drawbacks to both types of symbology, I decided to use proportional symbols because the size of the counties in California is irregular, and the size of the counties would affect the visual weight of the color. I executed the same steps to create the maps of milk production and number of operations for all years, as follows. There are at least two methods to create the maps of proportional symbols, and I used both – the first in R for creating a series of 41 maps automatically and the second in ArcGIS Pro for more detailed cartographic design. In R Studio, I created centroids of each county based on a shape file of California County Boundaries , and then joined the processed and interpolated data to the shape file based on county name. I used the “tmap” package with the “tm_bubbles” function to create circles based on the quantity of milk production or operations within each county. I set the “size.max” as the maximum production quantity, or number of operations, for the full dataset to standardize symbology across all the years. I set the scale to 5, the style to “fixed,” and set breaks to 2,500,000 for production. I used a function to iterate over each column of the dataset to create maps of production for all 41 years automatically. In ArcGIS Pro, the visualization was almost identical but the steps to create the maps were different. I symbolized the data from each year using proportional symbols and set the maximum point size for each year to a fraction of that year’s maximum value to standardize the sizes of the symbols across all years. For the production maps, this fraction was one over one million – for example in 1980 the maximum production value was 24,711,000 Cwt, and I set the maximum symbol size for the 1980 map to 24.7 points.
The minimum symbol size for production was 1pt throughout all years because the minimum production values were always less than half a million. For the operations maps, the fraction was one fifth – so in 1997 the maximum number of operations in a county was 325, and I set the maximum symbol size to 65 points. The minimum symbol size was calculated the same way, but the minimum number was always less than 6 so the minimum point size was always one. Taking the same fraction of each year’s minimum and maximum created a uniform scale of point sizes across the multiple years. If I were to leave the minimum and maximum the same for multiple years, rolling grow table the values would be stretched or compressed within that range, but the sizes would not correspond to the same values across different years. This did mean that each of the years had to be symbolized individually, which I did for five decades for the production maps and five years for the operations maps. Since the maps in ArcGIS were created to be static maps published in this thesis submission, or a journal submission, where a GIF animation will not work, I decided to average the 41 years of production data by decade, so as to not be forced to cherry-pick 5 singular years, thus losing 37 years of data, when creating a series of five maps. While averaging the decades does reduce the temporal resolution, the averages still represent the general annual production rate of each county. I calculated the average annual production in Excel for 1980-1989, 1990-1999, 2000- 2009, and 2010-2019. I left 2020 on its own and symbolized its own map, since the data seemed dissimilar to the production rates even a few years before, and is the start of a new decade. There are a few limitations to the data involving the quality and time-period and scale of the data that must be acknowledged. The data had 45 counties in the production dataset and 55 counties in the operations dataset. This points to potential flaws and differences in both the county crop reports and the Census of Agriculture data. The annual county crop reports are compiled from independent reports, they do not claim to cover all of agriculture or report everything. In the reports, counties aggregate the revenues of many commodities and may exclude small quantities. This may result in underestimates of production. Missing data will also increase errors in the percent share calculation of other counties. As described in the data processing subsection, missing data was addressed using interpolation . The Census of Agriculture has the opposite problem where any operation with milk sales is included in the dataset regardless of size or actual commercial dairy status, possibly resulting in overestimates of the number of active dairy operations. The data is also limited to the time-period of available data and that two datasets do not cover the same length of time.
The county crop reports were available annually from 1980, while the Census of Agriculture was only available every 5 years beginning in 1997. This limits the capacity for comparison of production and number of operations by year, as the beginning dates are 17 years apart and the resolution of the data is so different. This also keeps our analysis in the contemporary period, whereas we know from our literature review that milk has been produced commercially in California since the late 1800s, about 150 years. Finally there is the problem of resolution and geolocation. The finest scale data readily available is at the county level. The county sizes vary greatly and many of them overlap with mountain ranges or deserts that are not agriculturally productive, therefor diluting the actual area. When I use proportional symbols, they are plotted in the center of the county, or the centroid. For large counties like San Bernardino and Tulare where more than half of the state is desert or mountains respectively, this can create a misleading effect in the maps. This problem persists, and possibly made worse, when I use graduated colors or dot density maps as well, as the full area is filled with color or dots, while actual milk production may occur only in a small corner of the county. This problem would be ameliorated with more information on the locations of the dairies, but without it, the maps should be interpreted with the understanding that the dots are not accurately located where milk production occurs in the county, and instead should be interpreted as a symbolic marker of the county’s production rates. The following two sections contain maps visualizing the data collected on milk production and operations at the county-level, each followed by tables of the quantities and percent shares of production and operations that the maps were based on for more detail. The California Agricultural Commissioner’s Annual Crop Reports provide data on individual county milk production rates from 1980-2020. Figures 4-8 are maps of the average milk production by decade. Table 3 shows the average annual volume of milk produced for each county by decade, and Table 4 shows each county’s percent share of the total average annual production. Table 5 shows the actual change and percent change between the 1980s and 2010s of average annual production for each county. The Census of Agriculture provides data on the number of operations with milk sales for each county for the years 1997, 2002, 2007, 2012 and 2017. Figures 9-13 are maps showing the decreasing number of operations by county through all 5 of the Census years. Table 6 shows the quantities and percent share of operations for each county for all 5 years. Table 7 shows the actual change and percent change in number of operations between 1997 and 2017.