The vote of each tree is tallied; then the unknown sample is assigned the class in which it received the most votes. For each tree, a subset of the variables is used to develop the split. Also, a variable importance ranking is derived with the algorithm. Analysts can evaluate the ranking and identify variables that are relevant to the model for prediction or classification problems, leading to reduction in the number of variables used for the analysis. Random forest ranks very well among other classifiers . Recently, it was ranked in the top ten of 100 classifiers tested for classification purposes . It is capable of processing large datasets, can analyze numerous variables without deletion, is robust for analysis of datasets with missing variables, and has the ability to process unbalanced datasets. Models derived for classification can be used on other datasets. Light reflectance data recorded from sensors on-board satellite, airborne, and ground-based platforms have shown promise as input data for random forest to use for classification and regression problems. Currently, no information is available on using vegetation indices derived from multispectral data as input into random forest for soybean weed discrimination. The objective of this study was to evaluate normalized difference vegetation indices as input into random forest to differentiate soybeans and three broad leaf weeds: Palmer amaranth, redroot pigweed, and velvetleaf.
The study focused on leaf multispectral reflectance data of simulated World View 3 satellite sensor bands. Two experiments were completed in 2014 in which 30 replicates of soybean variety 4928LL , Palmer amaranth, redroot pigweed, and velvetleaf were grown in a greenhouse located at USDA-ARS, Stoneville, MS. Planting dates were June 13, 2014, and August 28, 2014, for experiments one and two, respectively. Seeds of the different plants were sown into plugs. After emergence, the plant species were transferred to two liter pots. All plant species were exposed to a fourteen-hour photoperiod, and light was supplemented at the beginning and ending of the day with sodium vapor lamps. Day/night temperatures of the greenhouse was maintained at 28˚C/24˚C ± 3˚C, cannabis grow tray respectively. The soybean variety used in the study had an indeterminate growth habit and gray pubescence and was assigned to maturity group 4.9. The weed seeds were obtained from a seed bank maintained at the laboratory. Leaf reflectance measurements were obtained with a plant contact probe attached to a spectroradiometer sensitive to a spectral range of 350 to 2500 nm. The contact probe has its own light source allowing the user to collect data anytime during the day or night. Reflectance measurements were collected from the most recently matured leaf of each plant; each reading was an average of fifteen readings. The spectroadiometer was calibrated in fifteen minute intervals with a white spectralon panel. Leaf reflectance measurements were collected prior to the plants reaching 1 foot tall. The goal of weed management strategies is to detect and kill the weeds in the vegetative growth stages and prior to the seeds reaching full maturity levels. Broadband and narrowband data have been used to develop vegetation indices.
Therefore, the simulated band center wavelength closest to center wavelengths used in other studies were employed in developing each vegetation index. Periodically, two vegetation indices were developed for a designated index because the band centers were equidistance from the band centers used by other investigators. These include the advanced normalized difference vegetation index , shortwave infrared water stress index , and structure insensitive pigment index . The hsdar package of R and base R packages were used to develop the sixteen spectral bands and the twelve vegetation indices, respectively. The conditional inference version of random forest was used for the classifications.Cforest is more stable in deriving variable importance values in the presence of highly correlated variables, thus providing better accuracy in calculating variable importance . Differences of cforest compared with the original version of random forest are as follows. Cforest employs conditional inference trees as base learners; random forest uses classification and regression trees as base learners Cforest develops unbiased decision trees based on subsampling without replacement instead of using bootstrap samples. The algorithm uses the conditional permutation scheme described by Strobl et al. to determine variable importance ranking.