diff options
Diffstat (limited to 'vignettes/prebuilt/2023_mesotrione_parent.rmd')
-rw-r--r-- | vignettes/prebuilt/2023_mesotrione_parent.rmd | 596 |
1 files changed, 596 insertions, 0 deletions
diff --git a/vignettes/prebuilt/2023_mesotrione_parent.rmd b/vignettes/prebuilt/2023_mesotrione_parent.rmd new file mode 100644 index 00000000..f11effc7 --- /dev/null +++ b/vignettes/prebuilt/2023_mesotrione_parent.rmd @@ -0,0 +1,596 @@ +--- +title: "Testing covariate modelling in hierarchical parent degradation kinetics with residue data on mesotrione" +author: Johannes Ranke +date: Last change on 4 August 2023, last compiled on `r format(Sys.time(), "%e + %B %Y")` +output: + pdf_document: + extra_dependencies: ["float", "listing"] +toc: yes +geometry: margin=2cm +--- + +```{r setup, echo = FALSE, cache = FALSE} +options(width = 80) # For summary listings +knitr::opts_chunk$set( + cache = TRUE, + comment = "", tidy = FALSE, + fig.pos = "H", fig.align = "center" +) +``` + +\clearpage + +# Introduction + +The purpose of this document is to test demonstrate how nonlinear hierarchical +models (NLHM) based on the parent degradation models SFO, FOMC, DFOP and HS +can be fitted with the mkin package, also considering the influence of +covariates like soil pH on different degradation parameters. Because in some +other case studies, the SFORB parameterisation of biexponential decline has +shown some advantages over the DFOP parameterisation, SFORB was included +in the list of tested models as well. + +The mkin package is used in version `r packageVersion("mkin")`, which is +contains the functions that were used for the evaluations. The `saemix` package +is used as a backend for fitting the NLHM, but is also loaded to make the +convergence plot function available. + +This document is processed with the `knitr` package, which also provides the +`kable` function that is used to improve the display of tabular data in R +markdown documents. For parallel processing, the `parallel` package is used. + +```{r packages, cache = FALSE, message = FALSE} +library(mkin) +library(knitr) +library(saemix) +library(parallel) +n_cores <- detectCores() +if (Sys.info()["sysname"] == "Windows") { + cl <- makePSOCKcluster(n_cores) +} else { + cl <- makeForkCluster(n_cores) +} +``` + +\clearpage + +## Test data + +```{r data} +data_file <- system.file( + "testdata", "mesotrione_soil_efsa_2016.xlsx", package = "mkin") +meso_ds <- read_spreadsheet(data_file, parent_only = TRUE) +``` + +The following tables show the covariate data and the `r length(meso_ds)` +datasets that were read in from the spreadsheet file. + +```{r show-covar-data, dependson = "data", results = "asis"} +pH <- attr(meso_ds, "covariates") +kable(pH, caption = "Covariate data") +``` + +\clearpage + +```{r show-data, dependson = "data", results = "asis"} +for (ds_name in names(meso_ds)) { + print( + kable(mkin_long_to_wide(meso_ds[[ds_name]]), + caption = paste("Dataset", ds_name), + booktabs = TRUE, row.names = FALSE)) +} +``` + +\clearpage + +# Separate evaluations + +In order to obtain suitable starting parameters for the NLHM fits, +separate fits of the five models to the data for each +soil are generated using the `mmkin` function from the mkin package. +In a first step, constant variance is assumed. Convergence +is checked with the `status` function. + +```{r f-sep-const, dependson = "data"} +deg_mods <- c("SFO", "FOMC", "DFOP", "SFORB", "HS") +f_sep_const <- mmkin( + deg_mods, + meso_ds, + error_model = "const", + cluster = cl, + quiet = TRUE) +``` + +```{r dependson = "f-sep-const"} +status(f_sep_const[, 1:5]) |> kable() +status(f_sep_const[, 6:18]) |> kable() +``` + +In the tables above, OK indicates convergence and C indicates failure to +converge. Most separate fits with constant variance converged, with the +exception of two FOMC fits, one SFORB fit and one HS fit. + +```{r f-sep-tc, dependson = "f-sep-const"} +f_sep_tc <- update(f_sep_const, error_model = "tc") +``` + +```{r dependson = "f-sep-tc"} +status(f_sep_tc[, 1:5]) |> kable() +status(f_sep_tc[, 6:18]) |> kable() +``` + +With the two-component error model, the set of fits that did not converge +is larger, with convergence problems appearing for a number of non-SFO fits. + +\clearpage + +# Hierarchical model fits without covariate effect + +The following code fits hierarchical kinetic models for the ten combinations of +the five different degradation models with the two different error models in +parallel. + +```{r f-saem-1, dependson = c("f-sep-const", "f-sep-tc")} +f_saem_1 <- mhmkin(list(f_sep_const, f_sep_tc), cluster = cl) +status(f_saem_1) |> kable() +``` + +All fits terminate without errors (status OK). + +```{r dependson = "f-saem-1"} +anova(f_saem_1) |> kable(digits = 1) +``` +The model comparisons show that the fits with constant variance are consistently +preferable to the corresponding fits with two-component error for these data. +This is confirmed by the fact that the parameter `b.1` (the relative standard +deviation in the fits obtained with the saemix package), is ill-defined in all +fits. + +```{r dependson = "f-saem-1"} +illparms(f_saem_1) |> kable() +``` + +For obtaining fits with only well-defined random effects, we update +the set of fits, excluding random effects that were ill-defined +according to the `illparms` function. + +```{r f-saem-2, dependson = "f-saem-1"} +f_saem_2 <- update(f_saem_1, no_random_effect = illparms(f_saem_1)) +status(f_saem_2) |> kable() +``` + +The updated fits terminate without errors. + +```{r dependson = "f-saem-2"} +illparms(f_saem_2) |> kable() +``` + +No ill-defined errors remain in the fits with constant variance. + +\clearpage + +# Hierarchical model fits with covariate effect + +In the following sections, hierarchical fits including a model for the +influence of pH on selected degradation parameters are shown for all parent +models. Constant variance is selected as the error model based on the fits +without covariate effects. Random effects that were ill-defined in the fits +without pH influence are excluded. A potential influence of the soil pH is only +included for parameters with a well-defined random effect, because experience +has shown that only for such parameters a significant pH effect could be found. + +## SFO + +```{r sfo-pH, dependson = "f-sep-const"} +sfo_pH <- saem(f_sep_const["SFO", ], no_random_effect = "meso_0", covariates = pH, + covariate_models = list(log_k_meso ~ pH)) +``` + +```{r dependson = "sfo-pH"} +summary(sfo_pH)$confint_trans |> kable(digits = 2) +``` + +The parameter showing the pH influence in the above table is +`beta_pH(log_k_meso)`. Its confidence interval does not include zero, +indicating that the influence of soil pH on the log of the degradation rate +constant is significantly greater than zero. + + +```{r dependson = "sfo-pH"} +anova(f_saem_2[["SFO", "const"]], sfo_pH, test = TRUE) +``` + +The comparison with the SFO fit without covariate effect confirms that +considering the soil pH improves the model, both by comparison of AIC and BIC +and by the likelihood ratio test. + +\clearpage + +```{r dependson = "sfo-pH", plot.height = 5} +plot(sfo_pH) +``` + +Endpoints for a model with covariates are by default calculated for +the median of the covariate values. This quantile can be adapted, +or a specific covariate value can be given as shown below. + +```{r dependson = "sfo-pH", plot.height = 5} +endpoints(sfo_pH) +endpoints(sfo_pH, covariate_quantile = 0.9) +endpoints(sfo_pH, covariates = c(pH = 7.0)) +``` + +\clearpage + +## FOMC + +```{r fomc-pH, dependson = "f-sep-const"} +fomc_pH <- saem(f_sep_const["FOMC", ], no_random_effect = "meso_0", covariates = pH, + covariate_models = list(log_alpha ~ pH)) +``` + +```{r dependson = "fomc-pH"} +summary(fomc_pH)$confint_trans |> kable(digits = 2) +``` + +As in the case of SFO, the confidence interval of the slope parameter +(here `beta_pH(log_alpha)`) quantifying the influence of soil pH +does not include zero, and the model comparison clearly indicates +that the model with covariate influence is preferable. +However, the random effect for `alpha` is not well-defined any +more after inclusion of the covariate effect (the confidence +interval of `SD.log_alpha` includes zero). + +```{r dependson = "fomc-pH"} +illparms(fomc_pH) +``` + +Therefore, the model is updated without this random effect, and +no ill-defined parameters remain. + +```{r fomc-pH-2, dependson = "fomc_pH"} +fomc_pH_2 <- update(fomc_pH, no_random_effect = c("meso_0", "log_alpha")) +illparms(fomc_pH_2) +``` + +```{r dependson = "fomc-pH-2"} +anova(f_saem_2[["FOMC", "const"]], fomc_pH, fomc_pH_2, test = TRUE) +``` + +Model comparison indicates that including pH dependence significantly improves +the fit, and that the reduced model with covariate influence results in +the most preferable FOMC fit. + +```{r dependson = "fomc-pH"} +summary(fomc_pH_2)$confint_trans |> kable(digits = 2) +``` + +\clearpage + +```{r dependson = "fomc-pH", plot.height = 5} +plot(fomc_pH_2) +``` + +```{r dependson = "fomc-pH", plot.height = 5} +endpoints(fomc_pH_2) +endpoints(fomc_pH_2, covariates = c(pH = 7)) +``` + +\clearpage + +## DFOP + +In the DFOP fits without covariate effects, random effects for two degradation +parameters (`k2` and `g`) were identifiable. + +```{r dependson = "f-saem-2"} +summary(f_saem_2[["DFOP", "const"]])$confint_trans |> kable(digits = 2) +``` + +A fit with pH dependent degradation parameters was obtained by excluding +the same random effects as in the refined DFOP fit without covariate influence, +and including covariate models for the two identifiable parameters `k2` and `g`. + +```{r dfop-pH, dependson = "f-sep-const"} +dfop_pH <- saem(f_sep_const["DFOP", ], no_random_effect = c("meso_0", "log_k1"), + covariates = pH, + covariate_models = list(log_k2 ~ pH, g_qlogis ~ pH)) +``` + +The corresponding parameters for +the influence of soil pH are `beta_pH(log_k2)` for the influence of soil pH on +`k2`, and `beta_pH(g_qlogis)` for its influence on `g`. + +```{r dependson = "dfop-pH"} +summary(dfop_pH)$confint_trans |> kable(digits = 2) +illparms(dfop_pH) +``` + +Confidence intervals for neither of them include zero, indicating a significant +difference from zero. However, the random effect for `g` is now ill-defined. +The fit is updated without this ill-defined random effect. + +```{r dfop-pH-2, dependson = "dfop-pH"} +dfop_pH_2 <- update(dfop_pH, + no_random_effect = c("meso_0", "log_k1", "g_qlogis")) +illparms(dfop_pH_2) +``` + +Now, the slope parameter for the pH effect on `g` is ill-defined. +Therefore, another attempt is made without the corresponding covariate model. + +```{r dfop-pH-3, dependson = "f-sep-const"} +dfop_pH_3 <- saem(f_sep_const["DFOP", ], no_random_effect = c("meso_0", "log_k1"), + covariates = pH, + covariate_models = list(log_k2 ~ pH)) +illparms(dfop_pH_3) +``` +As the random effect for `g` is again ill-defined, the fit is repeated without it. + +```{r dfop-pH-4, dependson = "dfop-pH-3"} +dfop_pH_4 <- update(dfop_pH_3, no_random_effect = c("meso_0", "log_k1", "g_qlogis")) +illparms(dfop_pH_4) +``` + +While no ill-defined parameters remain, model comparison suggests that the previous +model `dfop_pH_2` with two pH dependent parameters is preferable, based on +information criteria as well as based on the likelihood ratio test. + +```{r dependson = "dfop-pH-4"} +anova(f_saem_2[["DFOP", "const"]], dfop_pH, dfop_pH_2, dfop_pH_3, dfop_pH_4) +anova(dfop_pH_2, dfop_pH_4, test = TRUE) +``` + +When focussing on parameter identifiability using the test if the confidence +interval includes zero, `dfop_pH_4` would still be the preferred model. +However, it should be kept in mind that parameter confidence intervals are +constructed using a simple linearisation of the likelihood. As the confidence +interval of the random effect for `g` only marginally includes zero, +it is suggested that this is acceptable, and that `dfop_pH_2` can be considered +the most preferable model. + +\clearpage + +```{r dependson = "dfop-pH-2", plot.height = 5} +plot(dfop_pH_2) +``` + +```{r dependson = "dfop-pH-2", plot.height = 5} +endpoints(dfop_pH_2) +endpoints(dfop_pH_2, covariates = c(pH = 7)) +``` + +\clearpage + +## SFORB + +```{r sforb-pH, dependson = "f-sep-const"} +sforb_pH <- saem(f_sep_const["SFORB", ], no_random_effect = c("meso_free_0", "log_k_meso_free_bound"), + covariates = pH, + covariate_models = list(log_k_meso_free ~ pH, log_k_meso_bound_free ~ pH)) +``` + +```{r dependson = "sforb-pH"} +summary(sforb_pH)$confint_trans |> kable(digits = 2) +``` + +The confidence interval of `beta_pH(log_k_meso_bound_free)` includes zero, +indicating that the influence of soil pH on `k_meso_bound_free` cannot reliably +be quantified. Also, the confidence interval for the random effect on this +parameter (`SD.log_k_meso_bound_free`) includes zero. + +Using the `illparms` function, these ill-defined parameters can be found +more conveniently. + +```{r dependson = "sforb-pH"} +illparms(sforb_pH) +``` + +To remove the ill-defined parameters, a second variant of the SFORB model +with pH influence is fitted. No ill-defined parameters remain. + +```{r sforb-pH-2, dependson = "f-sforb-pH"} +sforb_pH_2 <- update(sforb_pH, + no_random_effect = c("meso_free_0", "log_k_meso_free_bound", "log_k_meso_bound_free"), + covariate_models = list(log_k_meso_free ~ pH)) +illparms(sforb_pH_2) +``` + +The model comparison of the SFORB fits includes the refined model without +covariate effect, and both versions of the SFORB fit with covariate effect. + +```{r dependson = "sforb-pH-2"} +anova(f_saem_2[["SFORB", "const"]], sforb_pH, sforb_pH_2, test = TRUE) +``` +The first model including pH influence is preferable based on information criteria +and the likelihood ratio test. However, as it is not fully identifiable, +the second model is selected. + +```{r dependson = "sforb-pH-2"} +summary(sforb_pH_2)$confint_trans |> kable(digits = 2) +``` +\clearpage + +```{r dependson = "sforb-pH-2", plot.height = 5} +plot(sforb_pH_2) +``` + +```{r dependson = "sforb-pH-2", plot.height = 5} +endpoints(sforb_pH_2) +endpoints(sforb_pH_2, covariates = c(pH = 7)) +``` + +\clearpage + +## HS + +```{r hs-pH, dependson = "f-sep-const"} +hs_pH <- saem(f_sep_const["HS", ], no_random_effect = c("meso_0"), + covariates = pH, + covariate_models = list(log_k1 ~ pH, log_k2 ~ pH, log_tb ~ pH)) +``` + +```{r dependson = "hs-pH"} +summary(hs_pH)$confint_trans |> kable(digits = 2) +illparms(hs_pH) +``` + +According to the output of the `illparms` function, the random effect on +the break time `tb` cannot reliably be quantified, neither can the influence of +soil pH on `tb`. The fit is repeated without the corresponding covariate +model, and no ill-defined parameters remain. + +```{r hs-pH-2, dependson = "hs-pH"} +hs_pH_2 <- update(hs_pH, covariate_models = list(log_k1 ~ pH, log_k2 ~ pH)) +illparms(hs_pH_2) +``` + +Model comparison confirms that this model is preferable to the fit without +covariate influence, and also to the first version with covariate influence. + +```{r dependson = c("hs-pH-2", "hs-pH-3")} +anova(f_saem_2[["HS", "const"]], hs_pH, hs_pH_2, test = TRUE) +``` + +```{r dependson = "hs-pH-2"} +summary(hs_pH_2)$confint_trans |> kable(digits = 2) +``` + +\clearpage + +```{r dependson = "hs-pH-2", plot.height = 5} +plot(hs_pH_2) +``` + +```{r dependson = "hs-pH-2", plot.height = 5} +endpoints(hs_pH_2) +endpoints(hs_pH_2, covariates = c(pH = 7)) +``` + +\clearpage + +## Comparison across parent models + +After model reduction for all models with pH influence, they are compared with +each other. + +```{r, dependson = c("sfo-pH-2", "fomc-pH-2", "dfop-pH-4", "sforb-pH-1", "hs-pH-3")} +anova(sfo_pH, fomc_pH_2, dfop_pH_2, dfop_pH_4, sforb_pH_2, hs_pH_2) +``` + +The DFOP model with pH influence on `k2` and `g` and a random effect only on +`k2` is finally selected as the best fit. + +The endpoints resulting from this model are listed below. Please refer to the +Appendix for a detailed listing. + +```{r, dependson = "dfop-pH-2"} +endpoints(dfop_pH_2) +endpoints(dfop_pH_2, covariates = c(pH = 7)) +``` + +# Conclusions + +These evaluations demonstrate that covariate effects can be included +for all types of parent degradation models. These models can then +be further refined to make them fully identifiable. + +\clearpage + +# Appendix + +## Hierarchical fit listings + +### Fits without covariate effects + +```{r, cache = FALSE, results = "asis", echo = FALSE} +errmods <- c(const = "constant variance", tc = "two-component error") +for (deg_mod in deg_mods) { + for (err_mod in c("const")) { + fit <- f_saem_1[[deg_mod, err_mod]] + if (!inherits(fit$so, "try-error")) { + caption <- paste("Hierarchical", deg_mod, "fit with", errmods[err_mod]) + tex_listing(fit, caption) + } + } +} +``` + +### Fits with covariate effects + +```{r listing-sfo, cache = FALSE, results = "asis", echo = FALSE} +tex_listing(sfo_pH, "Hierarchichal SFO fit with pH influence") +``` + +\clearpage + +```{r, cache = FALSE, results = "asis", echo = FALSE} +tex_listing(fomc_pH, "Hierarchichal FOMC fit with pH influence") +``` + +\clearpage + +```{r, cache = FALSE, results = "asis", echo = FALSE} +tex_listing(fomc_pH_2, "Refined hierarchichal FOMC fit with pH influence") +``` + +\clearpage + +```{r, cache = FALSE, results = "asis", echo = FALSE} +tex_listing(dfop_pH, "Hierarchichal DFOP fit with pH influence") +``` + +\clearpage + +```{r, cache = FALSE, results = "asis", echo = FALSE} +tex_listing(dfop_pH_2, "Refined hierarchical DFOP fit with pH influence") +``` + +\clearpage + +```{r, cache = FALSE, results = "asis", echo = FALSE} +tex_listing(dfop_pH_4, "Further refined hierarchical DFOP fit with pH influence") +``` + +\clearpage + +```{r, cache = FALSE, results = "asis", echo = FALSE} +tex_listing(sforb_pH, "Hierarchichal SFORB fit with pH influence") +``` +\clearpage + +```{r, cache = FALSE, results = "asis", echo = FALSE} +tex_listing(sforb_pH_2, "Refined hierarchichal SFORB fit with pH influence") +``` +\clearpage + +```{r, cache = FALSE, results = "asis", echo = FALSE} +tex_listing(hs_pH, "Hierarchichal HS fit with pH influence") +``` +\clearpage + +```{r, cache = FALSE, results = "asis", echo = FALSE} +tex_listing(hs_pH_2, "Refined hierarchichal HS fit with pH influence") +``` + +\clearpage + +## Session info + +```{r, echo = FALSE, cache = FALSE} +parallel::stopCluster(cl = cl) +sessionInfo() +``` + + +## Hardware info + +```{r, echo = FALSE} +if(!inherits(try(cpuinfo <- readLines("/proc/cpuinfo")), "try-error")) { + cat(gsub("model name\t: ", "CPU model: ", cpuinfo[grep("model name", cpuinfo)[1]])) +} +if(!inherits(try(meminfo <- readLines("/proc/meminfo")), "try-error")) { + cat(gsub("model name\t: ", "System memory: ", meminfo[grep("MemTotal", meminfo)[1]])) +} +``` |