--- title: "Testing hierarchical pathway kinetics with residue data on cyantraniliprole" author: Johannes Ranke date: Last change on 20 April 2023, last compiled on `r format(Sys.time(), "%e %B %Y")` output: pdf_document: extra_dependencies: ["float", "listing"] toc: yes geometry: margin=2cm --- ```{r setup, echo = FALSE, cache = FALSE} options(width = 80) # For summary listings knitr::opts_chunk$set( comment = "", tidy = FALSE, cache = TRUE, fig.pos = "H", fig.align = "center" ) ``` \clearpage # Introduction The purpose of this document is to test demonstrate how nonlinear hierarchical models (NLHM) based on the parent degradation models SFO, FOMC, DFOP and HS, with serial formation of two or more metabolites can be fitted with the mkin package. It was assembled in the course of work package 1.2 of Project Number 173340 (Application of nonlinear hierarchical models to the kinetic evaluation of chemical degradation data) of the German Environment Agency carried out in 2022 and 2023. The mkin package is used in version `r packageVersion("mkin")` which is currently under development. The newly introduced functionality that is used here is a simplification of excluding random effects for a set of fits based on a related set of fits with a reduced model, and the documentation of the starting parameters of the fit, so that all starting parameters of `saem` fits are now listed in the summary. The `saemix` package is used as a backend for fitting the NLHM, but is also loaded to make the convergence plot function available. This document is processed with the `knitr` package, which also provides the `kable` function that is used to improve the display of tabular data in R markdown documents. For parallel processing, the `parallel` package is used. ```{r, packages, cache = FALSE, message = FALSE} library(mkin) library(knitr) library(saemix) library(parallel) n_cores <- detectCores() # We need to start a new cluster after defining a compiled model that is # saved as a DLL to the user directory, therefore we define a function # This is used again after defining the pathway model start_cluster <- function(n_cores) { if (Sys.info()["sysname"] == "Windows") { ret <- makePSOCKcluster(n_cores) } else { ret <- makeForkCluster(n_cores) } return(ret) } cl <- start_cluster(n_cores) ``` \clearpage ## Test data The example data are taken from the final addendum to the DAR from 2014 and are distributed with the mkin package. Residue data and time step normalisation factors are read in using the function `read_spreadsheet` from the mkin package. This function also performs the time step normalisation. ```{r data} data_file <- system.file( "testdata", "cyantraniliprole_soil_efsa_2014.xlsx", package = "mkin") cyan_ds <- read_spreadsheet(data_file, parent_only = FALSE) ``` The following tables show the covariate data and the `r length(cyan_ds)` datasets that were read in from the spreadsheet file. ```{r show-covar-data, dependson = "data", results = "asis"} pH <- attr(cyan_ds, "covariates") kable(pH, caption = "Covariate data") ``` \clearpage ```{r show-data, dependson = "data", results = "asis"} for (ds_name in names(cyan_ds)) { print( kable(mkin_long_to_wide(cyan_ds[[ds_name]]), caption = paste("Dataset", ds_name), booktabs = TRUE, row.names = FALSE)) cat("\n\\clearpage\n") } ``` \clearpage # Parent only evaluations As the pathway fits have very long run times, evaluations of the parent data are performed first, in order to determine for each hierarchical parent degradation model which random effects on the degradation model parameters are ill-defined. ```{r parent-only, dependson = "data"} cyan_sep_const <- mmkin(c("SFO", "FOMC", "DFOP", "SFORB", "HS"), cyan_ds, quiet = TRUE, cores = n_cores) cyan_sep_tc <- update(cyan_sep_const, error_model = "tc") cyan_saem_full <- mhmkin(list(cyan_sep_const, cyan_sep_tc)) status(cyan_saem_full) |> kable() ``` All fits converged successfully. ```{r dependson = "parent-only"} illparms(cyan_saem_full) |> kable() ``` In almost all models, the random effect for the initial concentration of the parent compound is ill-defined. For the biexponential models DFOP and SFORB, the random effect of one additional parameter is ill-defined when the two-component error model is used. ```{r dependson = "parent-only"} anova(cyan_saem_full) |> kable(digits = 1) ``` Model comparison based on AIC and BIC indicates that the two-component error model is preferable for all parent models with the exception of DFOP. The lowest AIC and BIC values are are obtained with the FOMC model, followed by SFORB and DFOP. ```{r parent-only-reduced, dependson = "parent-only", include = FALSE} cyan_saem_reduced <- mhmkin(list(cyan_sep_const, cyan_sep_tc), no_random_effect = illparms(cyan_saem_full)) illparms(cyan_saem_reduced) anova(cyan_saem_reduced) |> kable(digits = 1) ``` ```{r} stopCluster(cl) ``` # Pathway fits ## Evaluations with pathway established previously To test the technical feasibility of coupling the relevant parent degradation models with different transformation pathway models, a list of `mkinmod` models is set up below. As in the EU evaluation, parallel formation of metabolites JCZ38 and J9Z38 and secondary formation of metabolite JSE76 from JCZ38 is used. ```{r, cyan-path-1} if (!dir.exists("cyan_dlls")) dir.create("cyan_dlls") cyan_path_1 <- list( sfo_path_1 = mkinmod( cyan = mkinsub("SFO", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO"), quiet = TRUE, name = "sfo_path_1", dll_dir = "cyan_dlls", overwrite = TRUE), fomc_path_1 = mkinmod( cyan = mkinsub("FOMC", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO"), quiet = TRUE, name = "fomc_path_1", dll_dir = "cyan_dlls", overwrite = TRUE), dfop_path_1 = mkinmod( cyan = mkinsub("DFOP", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO"), quiet = TRUE, name = "dfop_path_1", dll_dir = "cyan_dlls", overwrite = TRUE), sforb_path_1 = mkinmod( cyan = mkinsub("SFORB", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO"), quiet = TRUE, name = "sforb_path_1", dll_dir = "cyan_dlls", overwrite = TRUE), hs_path_1 = mkinmod( cyan = mkinsub("HS", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO"), quiet = TRUE, name = "hs_path_1", dll_dir = "cyan_dlls", overwrite = TRUE) ) cl_path_1 <- start_cluster(n_cores) ``` To obtain suitable starting values for the NLHM fits, separate pathway fits are performed for all datasets. ```{r, f-sep-1, dependson = c("data", "cyan_path_1")} f_sep_1_const <- mmkin( cyan_path_1, cyan_ds, error_model = "const", cluster = cl_path_1, quiet = TRUE) status(f_sep_1_const) |> kable() f_sep_1_tc <- update(f_sep_1_const, error_model = "tc") status(f_sep_1_tc) |> kable() ``` Most separate fits converged successfully. The biggest convergence problems are seen when using the HS model with constant variance. For the hierarchical pathway fits, those random effects that could not be quantified in the corresponding parent data analyses are excluded. In the code below, the output of the `illparms` function for the parent only fits is used as an argument `no_random_effect` to the `mhmkin` function. The possibility to do so was introduced in mkin version `1.2.2` which is currently under development. ```{r, f-saem-1, dependson = "f-sep-1"} f_saem_1 <- mhmkin(list(f_sep_1_const, f_sep_1_tc), no_random_effect = illparms(cyan_saem_full), cluster = cl_path_1) ``` ```{r dependson = "f-saem-1"} status(f_saem_1) |> kable() ``` The status information from the individual fits shows that all fits completed successfully. The matrix entries Fth and FO indicate that the Fisher Information Matrix could not be inverted for the fixed effects (theta) and the random effects (Omega), respectively. For the affected fits, ill-defined parameters cannot be determined using the `illparms` function, because it relies on the Fisher Information Matrix. ```{r dependson = "f-saem-1"} illparms(f_saem_1) |> kable() ``` The model comparison below suggests that the pathway fits using DFOP or SFORB for the parent compound provide the best fit. ```{r, dependson = "f-saem-1"} anova(f_saem_1) |> kable(digits = 1) ``` For these two parent model, successful fits are shown below. Plots of the fits with the other parent models are shown in the Appendix. ```{r fig.cap = "DFOP pathway fit with two-component error", dependson = "f-saem-1", fig.height = 8} plot(f_saem_1[["dfop_path_1", "tc"]]) ``` \clearpage ```{r fig.cap = "SFORB pathway fit with two-component error", dependson = "f-saem-1", fig.height = 8} plot(f_saem_1[["sforb_path_1", "tc"]]) ``` A closer graphical analysis of these Figures shows that the residues of transformation product JCZ38 in the soils Tama and Nambsheim observed at later time points are strongly and systematically underestimated. \clearpage ```{r} stopCluster(cl_path_1) ``` ## Alternative pathway fits To improve the fit for JCZ38, a back-reaction from JSE76 to JCZ38 was introduced in an alternative version of the transformation pathway, in analogy to the back-reaction from K5A78 to K5A77. Both pairs of transformation products are pairs of an organic acid with its corresponding amide (Addendum 2014, p. 109). As FOMC provided the best fit for the parent, and the biexponential models DFOP and SFORB provided the best initial pathway fits, these three parent models are used in the alternative pathway fits. ```{r, f-sep-2-const, dependson = "data"} cyan_path_2 <- list( fomc_path_2 = mkinmod( cyan = mkinsub("FOMC", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO", "JCZ38"), name = "fomc_path_2", quiet = TRUE, dll_dir = "cyan_dlls", overwrite = TRUE ), dfop_path_2 = mkinmod( cyan = mkinsub("DFOP", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO", "JCZ38"), name = "dfop_path_2", quiet = TRUE, dll_dir = "cyan_dlls", overwrite = TRUE ), sforb_path_2 = mkinmod( cyan = mkinsub("SFORB", c("JCZ38", "J9Z38")), JCZ38 = mkinsub("SFO", "JSE76"), J9Z38 = mkinsub("SFO"), JSE76 = mkinsub("SFO", "JCZ38"), name = "sforb_path_2", quiet = TRUE, dll_dir = "cyan_dlls", overwrite = TRUE ) ) cl_path_2 <- start_cluster(n_cores) f_sep_2_const <- mmkin( cyan_path_2, cyan_ds, error_model = "const", cluster = cl_path_2, quiet = TRUE) status(f_sep_2_const) |> kable() ``` Using constant variance, separate fits converge with the exception of the fits to the Sassafras soil data. ```{r f-sep-2-tc, dependson = "f-sep-2-const"} f_sep_2_tc <- update(f_sep_2_const, error_model = "tc") status(f_sep_2_tc) |> kable() ``` Using the two-component error model, all separate fits converge with the exception of the alternative pathway fit with DFOP used for the parent and the Sassafras dataset. ```{r f-saem-2, dependson = c("f-sep-2-const", "f-sep-2-tc")} f_saem_2 <- mhmkin(list(f_sep_2_const, f_sep_2_tc), no_random_effect = illparms(cyan_saem_full[2:4, ]), cluster = cl_path_2) ``` ```{r dependson = "f-saem-2"} status(f_saem_2) |> kable() ``` The hierarchical fits for the alternative pathway completed successfully. ```{r dependson = "f-saem-2"} illparms(f_saem_2) |> kable() ``` In both fits, the random effects for the formation fractions for the pathways from JCZ38 to JSE76, and for the reverse pathway from JSE76 to JCZ38 are ill-defined. ```{r dependson = "f-saem-2"} anova(f_saem_2) |> kable(digits = 1) ``` The variants using the biexponential models DFOP and SFORB for the parent compound and the two-component error model give the lowest AIC and BIC values and are plotted below. Compared with the original pathway, the AIC and BIC values indicate a large improvement. This is confirmed by the plots, which show that the metabolite JCZ38 is fitted much better with this model. \clearpage ```{r fig.cap = "FOMC pathway fit with two-component error, alternative pathway", dependson = "f-saem-2", fig.height = 8} plot(f_saem_2[["fomc_path_2", "tc"]]) ``` \clearpage ```{r fig.cap = "DFOP pathway fit with two-component error, alternative pathway", dependson = "f-saem-2", fig.height = 8} plot(f_saem_2[["dfop_path_2", "tc"]]) ``` \clearpage ```{r fig.cap = "SFORB pathway fit with two-component error, alternative pathway", dependson = "f-saem-2", fig.height = 8} plot(f_saem_2[["sforb_path_2", "tc"]]) ``` \clearpage ## Refinement of alternative pathway fits All ill-defined random effects that were identified in the parent only fits and in the above pathway fits, are excluded for the final evaluations below. For this purpose, a list of character vectors is created below that can be indexed by row and column indices, and which contains the degradation parameter names for which random effects should be excluded for each of the hierarchical fits contained in `f_saem_2`. ```{r f-saem-3, dependson = "f-saem-2"} no_ranef <- matrix(list(), nrow = 3, ncol = 2, dimnames = dimnames(f_saem_2)) no_ranef[["fomc_path_2", "const"]] <- c("log_beta", "f_JCZ38_qlogis", "f_JSE76_qlogis") no_ranef[["fomc_path_2", "tc"]] <- c("cyan_0", "f_JCZ38_qlogis", "f_JSE76_qlogis") no_ranef[["dfop_path_2", "const"]] <- c("cyan_0", "f_JCZ38_qlogis", "f_JSE76_qlogis") no_ranef[["dfop_path_2", "tc"]] <- c("cyan_0", "log_k1", "f_JCZ38_qlogis", "f_JSE76_qlogis") no_ranef[["sforb_path_2", "const"]] <- c("cyan_free_0", "f_JCZ38_qlogis", "f_JSE76_qlogis") no_ranef[["sforb_path_2", "tc"]] <- c("cyan_free_0", "log_k_cyan_free_bound", "f_JCZ38_qlogis", "f_JSE76_qlogis") clusterExport(cl_path_2, "no_ranef") f_saem_3 <- update(f_saem_2, no_random_effect = no_ranef, cluster = cl_path_2) ``` ```{r dependson = "f-saem-3"} status(f_saem_3) |> kable() ``` With the exception of the FOMC pathway fit with constant variance, all updated fits completed successfully. However, the Fisher Information Matrix for the fixed effects (Fth) could not be inverted, so no confidence intervals for the optimised parameters are available. ```{r dependson = "f-saem-3"} illparms(f_saem_3) |> kable() ``` ```{r dependson = "f-saem-3"} anova(f_saem_3) |> kable(digits = 1) ``` While the AIC and BIC values of the best fit (DFOP pathway fit with two-component error) are lower than in the previous fits with the alternative pathway, the practical value of these refined evaluations is limited as no confidence intervals are obtained. ```{r} stopCluster(cl_path_2) ``` \clearpage # Conclusion It was demonstrated that a relatively complex transformation pathway with parallel formation of two primary metabolites and one secondary metabolite can be fitted even if the data in the individual datasets are quite different and partly only cover the formation phase. The run times of the pathway fits were several hours, limiting the practical feasibility of iterative refinements based on ill-defined parameters and of alternative checks of parameter identifiability based on multistart runs. # Acknowledgements The helpful comments by Janina Wöltjen of the German Environment Agency are gratefully acknowledged. \clearpage # Appendix ## Plots of fits that were not refined further ```{r fig.cap = "SFO pathway fit with two-component error", dependson = "f-saem-1", fig.height = 8} plot(f_saem_1[["sfo_path_1", "tc"]]) ``` \clearpage ```{r fig.cap = "FOMC pathway fit with two-component error", dependson = "f-saem-1", fig.height = 8} plot(f_saem_1[["fomc_path_1", "tc"]]) ``` \clearpage ```{r fig.cap = "HS pathway fit with two-component error", dependson = "f-saem-1", fig.height = 8} plot(f_saem_1[["sforb_path_1", "tc"]]) ``` \clearpage ## Hierarchical fit listings ### Pathway 1 ```{r listings-1, results = "asis", echo = FALSE, cache = FALSE} errmods <- c(const = "constant variance", tc = "two-component error") degmods <- c( sfo_path_1 = "SFO path 1", fomc_path_1 = "FOMC path 1", dfop_path_1 = "DFOP path 1", sforb_path_1 = "SFORB path 1", hs_path_1 = "HS path 1") for (deg_mod in rownames(f_saem_1)) { for (err_mod in c("const", "tc")) { fit <- f_saem_1[[deg_mod, err_mod]] if (!inherits(fit$so, "try-error")) { caption <- paste("Hierarchical", degmods[deg_mod], "fit with", errmods[err_mod]) summary_listing(fit, caption) } } } ``` ### Pathway 2 ```{r listings-2, results = "asis", echo = FALSE, cache = FALSE} degmods <- c( fomc_path_2 = "FOMC path 2", dfop_path_2 = "DFOP path 2", sforb_path_2 = "SFORB path 2") for (deg_mod in rownames(f_saem_2)) { for (err_mod in c("const", "tc")) { fit <- f_saem_2[[deg_mod, err_mod]] if (!inherits(fit$so, "try-error")) { caption <- paste("Hierarchical", degmods[deg_mod], "fit with", errmods[err_mod]) summary_listing(fit, caption) } } } ``` ### Pathway 2, refined fits ```{r listings-3, results = "asis", echo = FALSE, cache = FALSE} degmods <- c( fomc_path_2 = "FOMC path 2", dfop_path_2 = "DFOP path 2", sforb_path_2 = "SFORB path 2") for (deg_mod in rownames(f_saem_3)) { for (err_mod in c("const", "tc")) { fit <- f_saem_3[[deg_mod, err_mod]] if (!inherits(fit$so, "try-error")) { caption <- paste("Hierarchical", degmods[deg_mod], "fit with reduced random effects,", errmods[err_mod]) summary_listing(fit, caption) } } } ``` ## Session info ```{r, echo = FALSE, cache = FALSE} sessionInfo() ``` ## Hardware info ```{r, echo = FALSE} if(!inherits(try(cpuinfo <- readLines("/proc/cpuinfo")), "try-error")) { cat(gsub("model name\t: ", "CPU model: ", cpuinfo[grep("model name", cpuinfo)[1]])) } if(!inherits(try(meminfo <- readLines("/proc/meminfo")), "try-error")) { cat(gsub("model name\t: ", "System memory: ", meminfo[grep("MemTotal", meminfo)[1]])) } ```