Example evaluations of the dimethenamid data from 2018 • mkin

- - - - -

Wissenschaftlicher Berater, Kronacher -Str. 12, 79639 Grenzach-Wyhlen, Germany

Introduction -

A first analysis of the data analysed here was presented in a recent -journal article on nonlinear mixed-effects models in degradation -kinetics (Ranke et al. 2021). That -analysis was based on the nlme package and a development -version of the saemix package that was unpublished at the -time. Meanwhile, version 3.0 of the saemix package is -available from the CRAN repository. Also, it turned out that there was -an error in the handling of the Borstel data in the mkin package at the -time, leading to the duplication of a few data points from that soil. -The dataset in the mkin package has been corrected, and the interface to -saemix in the mkin package has been updated to use the -released version.

This vignette is intended to present an up to date analysis of the -data, using the corrected dataset and released versions of -mkin and saemix.

Data -

Residue data forming the basis for the endpoints derived in the -conclusion on the peer review of the pesticide risk assessment of -dimethenamid-P published by the European Food Safety Authority (EFSA) in -2018 (EFSA 2018) were transcribed from the -risk assessment report (Rapporteur Member State -Germany, Co-Rapporteur Member State Bulgaria 2018) which can be -downloaded from the Open EFSA repository https://open.efsa.europa.eu/study-inventory/EFSA-Q-2014-00716.

The data are available -in the mkin package. The following code (hidden by default, please -use the button to the right to show it) treats the data available for -the racemic mixture dimethenamid (DMTA) and its enantiomer -dimethenamid-P (DMTAP) in the same way, as no difference between their -degradation behaviour was identified in the EU risk assessment. The -observation times of each dataset are multiplied with the corresponding -normalisation factor also available in the dataset, in order to make it -possible to describe all datasets with a single set of parameters.

Also, datasets observed in the same soil are merged, resulting in -dimethenamid (DMTA) data from six soils.

-library(mkin, quietly = TRUE)
-dmta_ds <- lapply(1:7, function(i) {
-  ds_i <- dimethenamid_2018$ds[[i]]$data
-  ds_i[ds_i$name == "DMTAP", "name"] <-  "DMTA"
-  ds_i$time <- ds_i$time * dimethenamid_2018$f_time_norm[i]
-  ds_i
-})
-names(dmta_ds) <- sapply(dimethenamid_2018$ds, function(ds) ds$title)
-dmta_ds[["Elliot"]] <- rbind(dmta_ds[["Elliot 1"]], dmta_ds[["Elliot 2"]])
-dmta_ds[["Elliot 1"]] <- NULL
-dmta_ds[["Elliot 2"]] <- NULL

Parent degradation -

We evaluate the observed degradation of the parent compound using -simple exponential decline (SFO) and biexponential decline (DFOP), using -constant variance (const) and a two-component variance (tc) as error -models.

Separate evaluations -

As a first step, to get a visual impression of the fit of the -different models, we do separate evaluations for each soil using the -mmkin function from the mkin package:

-f_parent_mkin_const <- mmkin(c("SFO", "DFOP"), dmta_ds,
-  error_model = "const", quiet = TRUE)
-f_parent_mkin_tc <- mmkin(c("SFO", "DFOP"), dmta_ds,
-  error_model = "tc", quiet = TRUE)

The plot of the individual SFO fits shown below suggests that at -least in some datasets the degradation slows down towards later time -points, and that the scatter of the residuals error is smaller for -smaller values (panel to the right):

-plot(mixed(f_parent_mkin_const["SFO", ]))

Using biexponential decline (DFOP) results in a slightly more random -scatter of the residuals:

-plot(mixed(f_parent_mkin_const["DFOP", ]))

The population curve (bold line) in the above plot results from -taking the mean of the individual transformed parameters, i.e. of log k1 -and log k2, as well as of the logit of the g parameter of the DFOP -model). Here, this procedure does not result in parameters that -represent the degradation well, because in some datasets the fitted -value for k2 is extremely close to zero, leading to a log k2 value that -dominates the average. This is alleviated if only rate constants that -pass the t-test for significant difference from zero (on the -untransformed scale) are considered in the averaging:

-plot(mixed(f_parent_mkin_const["DFOP", ]), test_log_parms = TRUE)

While this is visually much more satisfactory, such an average -procedure could introduce a bias, as not all results from the individual -fits enter the population curve with the same weight. This is where -nonlinear mixed-effects models can help out by treating all datasets -with equally by fitting a parameter distribution model together with the -degradation model and the error model (see below).

The remaining trend of the residuals to be higher for higher -predicted residues is reduced by using the two-component error -model:

-plot(mixed(f_parent_mkin_tc["DFOP", ]), test_log_parms = TRUE)

However, note that in the case of using this error model, the fits to -the Flaach and BBA 2.3 datasets appear to be ill-defined, indicated by -the fact that they did not converge:

-print(f_parent_mkin_tc["DFOP", ])

<mmkin> object
-Status of individual fits:
-
-      dataset
-model  Calke Borstel Flaach BBA 2.2 BBA 2.3 Elliot
-  DFOP OK    OK      C      OK      C       OK    
-
-C: Optimisation did not converge:
-iteration limit reached without convergence (10)
-OK: No warnings

Nonlinear mixed-effects models -

Instead of taking a model selection decision for each of the -individual fits, we fit nonlinear mixed-effects models (using different -fitting algorithms as implemented in different packages) and do model -selection using all available data at the same time. In order to make -sure that these decisions are not unduly influenced by the type of -algorithm used, by implementation details or by the use of wrong control -parameters, we compare the model selection results obtained with -different R packages, with different algorithms and checking control -parameters.

nlme -

The nlme package was the first R extension providing facilities to -fit nonlinear mixed-effects models. We would like to do model selection -from all four combinations of degradation models and error models based -on the AIC. However, fitting the DFOP model with constant variance and -using default control parameters results in an error, signalling that -the maximum number of 50 iterations was reached, potentially indicating -overparameterisation. Nevertheless, the algorithm converges when the -two-component error model is used in combination with the DFOP model. -This can be explained by the fact that the smaller residues observed at -later sampling times get more weight when using the two-component error -model which will counteract the tendency of the algorithm to try -parameter combinations unsuitable for fitting these data.

-library(nlme)
-f_parent_nlme_sfo_const <- nlme(f_parent_mkin_const["SFO", ])
-# f_parent_nlme_dfop_const <- nlme(f_parent_mkin_const["DFOP", ])
-f_parent_nlme_sfo_tc <- nlme(f_parent_mkin_tc["SFO", ])
-f_parent_nlme_dfop_tc <- nlme(f_parent_mkin_tc["DFOP", ])

Note that a certain degree of overparameterisation is also indicated -by a warning obtained when fitting DFOP with the two-component error -model (‘false convergence’ in the ‘LME step’ in iteration 3). However, -as this warning does not occur in later iterations, and specifically not -in the last of the 5 iterations, we can ignore this warning.

The model comparison function of the nlme package can directly be -applied to these fits showing a much lower AIC for the DFOP model fitted -with the two-component error model. Also, the likelihood ratio test -indicates that this difference is significant as the p-value is below -0.0001.

-anova(
-  f_parent_nlme_sfo_const, f_parent_nlme_sfo_tc, f_parent_nlme_dfop_tc
-)

                        Model df    AIC    BIC  logLik   Test L.Ratio p-value
-f_parent_nlme_sfo_const     1  5 796.60 811.82 -393.30                       
-f_parent_nlme_sfo_tc        2  6 798.60 816.86 -393.30 1 vs 2    0.00   0.998
-f_parent_nlme_dfop_tc       3 10 671.91 702.34 -325.96 2 vs 3  134.69  <.0001

In addition to these fits, attempts were also made to include -correlations between random effects by using the log Cholesky -parameterisation of the matrix specifying them. The code used for these -attempts can be made visible below.

-f_parent_nlme_sfo_const_logchol <- nlme(f_parent_mkin_const["SFO", ],
-  random = nlme::pdLogChol(list(DMTA_0 ~ 1, log_k_DMTA ~ 1)))
-anova(f_parent_nlme_sfo_const, f_parent_nlme_sfo_const_logchol)
-f_parent_nlme_sfo_tc_logchol <- nlme(f_parent_mkin_tc["SFO", ],
-  random = nlme::pdLogChol(list(DMTA_0 ~ 1, log_k_DMTA ~ 1)))
-anova(f_parent_nlme_sfo_tc, f_parent_nlme_sfo_tc_logchol)
-f_parent_nlme_dfop_tc_logchol <- nlme(f_parent_mkin_const["DFOP", ],
-  random = nlme::pdLogChol(list(DMTA_0 ~ 1, log_k1 ~ 1, log_k2 ~ 1, g_qlogis ~ 1)))
-anova(f_parent_nlme_dfop_tc, f_parent_nlme_dfop_tc_logchol)

While the SFO variants converge fast, the additional parameters -introduced by this lead to convergence warnings for the DFOP model. The -model comparison clearly show that adding correlations between random -effects does not improve the fits.

The selected model (DFOP with two-component error) fitted to the data -assuming no correlations between random effects is shown below.

-plot(f_parent_nlme_dfop_tc)

saemix -

The saemix package provided the first Open Source implementation of -the Stochastic Approximation to the Expectation Maximisation (SAEM) -algorithm. SAEM fits of degradation models can be conveniently performed -using an interface to the saemix package available in current -development versions of the mkin package.

The corresponding SAEM fits of the four combinations of degradation -and error models are fitted below. As there is no convergence criterion -implemented in the saemix package, the convergence plots need to be -manually checked for every fit. We define control settings that work -well for all the parent data fits shown in this vignette.

-library(saemix)
-saemix_control <- saemixControl(nbiter.saemix = c(800, 300), nb.chains = 15,
-    print = FALSE, save = FALSE, save.graphs = FALSE, displayProgress = FALSE)
-saemix_control_moreiter <- saemixControl(nbiter.saemix = c(1600, 300), nb.chains = 15,
-    print = FALSE, save = FALSE, save.graphs = FALSE, displayProgress = FALSE)
-saemix_control_10k <- saemixControl(nbiter.saemix = c(10000, 300), nb.chains = 15,
-    print = FALSE, save = FALSE, save.graphs = FALSE, displayProgress = FALSE)

The convergence plot for the SFO model using constant variance is -shown below.

-f_parent_saemix_sfo_const <- mkin::saem(f_parent_mkin_const["SFO", ], quiet = TRUE,
-  control = saemix_control, transformations = "saemix")
-plot(f_parent_saemix_sfo_const$so, plot.type = "convergence")

Obviously the selected number of iterations is sufficient to reach -convergence. This can also be said for the SFO fit using the -two-component error model.

-f_parent_saemix_sfo_tc <- mkin::saem(f_parent_mkin_tc["SFO", ], quiet = TRUE,
-  control = saemix_control, transformations = "saemix")
-plot(f_parent_saemix_sfo_tc$so, plot.type = "convergence")

When fitting the DFOP model with constant variance (see below), -parameter convergence is not as unambiguous.

-f_parent_saemix_dfop_const <- mkin::saem(f_parent_mkin_const["DFOP", ], quiet = TRUE,
-  control = saemix_control, transformations = "saemix")
-plot(f_parent_saemix_dfop_const$so, plot.type = "convergence")

-print(f_parent_saemix_dfop_const)

Kinetic nonlinear mixed-effects model fit by SAEM
-Structural model:
-d_DMTA/dt = - ((k1 * g * exp(-k1 * time) + k2 * (1 - g) * exp(-k2 *
-           time)) / (g * exp(-k1 * time) + (1 - g) * exp(-k2 * time)))
-           * DMTA
-
-Data:
-155 observations of 1 variable(s) grouped in 6 datasets
-
-Likelihood computed by importance sampling
-  AIC BIC logLik
-  706 704   -344
-
-Fitted parameters:
-          estimate    lower   upper
-DMTA_0    97.99583 96.50079 99.4909
-k1         0.06377  0.03432  0.0932
-k2         0.00848  0.00444  0.0125
-g          0.95701  0.91313  1.0009
-a.1        1.82141  1.65122  1.9916
-SD.DMTA_0  1.64787  0.45772  2.8380
-SD.k1      0.57439  0.24731  0.9015
-SD.k2      0.03296 -2.50195  2.5679
-SD.g       1.10266  0.32369  1.8816

While the other parameters converge to credible values, the variance -of k2 (omega2.k2) converges to a very small value. The -printout of the saem.mmkin model shows that the estimated -standard deviation of k2 across the population of soils -(SD.k2) is ill-defined, indicating overparameterisation of -this model.

When the DFOP model is fitted with the two-component error model, we -also observe that the estimated variance of k2 becomes very small, while -being ill-defined, as illustrated by the excessive confidence interval -of SD.k2.

-f_parent_saemix_dfop_tc <- mkin::saem(f_parent_mkin_tc["DFOP", ], quiet = TRUE,
-  control = saemix_control, transformations = "saemix")
-f_parent_saemix_dfop_tc_moreiter <- mkin::saem(f_parent_mkin_tc["DFOP", ], quiet = TRUE,
-  control = saemix_control_moreiter, transformations = "saemix")
-plot(f_parent_saemix_dfop_tc$so, plot.type = "convergence")

-print(f_parent_saemix_dfop_tc)

Kinetic nonlinear mixed-effects model fit by SAEM
-Structural model:
-d_DMTA/dt = - ((k1 * g * exp(-k1 * time) + k2 * (1 - g) * exp(-k2 *
-           time)) / (g * exp(-k1 * time) + (1 - g) * exp(-k2 * time)))
-           * DMTA
-
-Data:
-155 observations of 1 variable(s) grouped in 6 datasets
-
-Likelihood computed by importance sampling
-  AIC BIC logLik
-  666 664   -323
-
-Fitted parameters:
-          estimate    lower    upper
-DMTA_0    98.27617  96.3088 100.2436
-k1         0.06437   0.0337   0.0950
-k2         0.00880   0.0063   0.0113
-g          0.95249   0.9100   0.9949
-a.1        1.06161   0.8625   1.2607
-b.1        0.02967   0.0226   0.0367
-SD.DMTA_0  2.06075   0.4187   3.7028
-SD.k1      0.59357   0.2561   0.9310
-SD.k2      0.00292 -10.2960  10.3019
-SD.g       1.05725   0.3808   1.7337

Doubling the number of iterations in the first phase of the algorithm -leads to a slightly lower likelihood, and therefore to slightly higher -AIC and BIC values. With even more iterations, the algorithm stops with -an error message. This is related to the variance of k2 approximating -zero and has been submitted as a bug -to the saemix package, as the algorithm does not converge in this -case.

An alternative way to fit DFOP in combination with the two-component -error model is to use the model formulation with transformed parameters -as used per default in mkin. When using this option, convergence is -slower, but eventually the algorithm stops as well with the same error -message.

The four combinations (SFO/const, SFO/tc, DFOP/const and DFOP/tc) and -the version with increased iterations can be compared using the model -comparison function of the saemix package:

-AIC_parent_saemix <- saemix::compare.saemix(
-  f_parent_saemix_sfo_const$so,
-  f_parent_saemix_sfo_tc$so,
-  f_parent_saemix_dfop_const$so,
-  f_parent_saemix_dfop_tc$so,
-  f_parent_saemix_dfop_tc_moreiter$so)

Likelihoods calculated by importance sampling

-rownames(AIC_parent_saemix) <- c(
-  "SFO const", "SFO tc", "DFOP const", "DFOP tc", "DFOP tc more iterations")
-print(AIC_parent_saemix)

                           AIC    BIC
-SFO const               796.38 795.34
-SFO tc                  798.38 797.13
-DFOP const              705.75 703.88
-DFOP tc                 665.65 663.57
-DFOP tc more iterations 665.88 663.80

In order to check the influence of the likelihood calculation -algorithms implemented in saemix, the likelihood from Gaussian -quadrature is added to the best fit, and the AIC values obtained from -the three methods are compared.

-f_parent_saemix_dfop_tc$so <-
-  saemix::llgq.saemix(f_parent_saemix_dfop_tc$so)
-AIC_parent_saemix_methods <- c(
-  is = AIC(f_parent_saemix_dfop_tc$so, method = "is"),
-  gq = AIC(f_parent_saemix_dfop_tc$so, method = "gq"),
-  lin = AIC(f_parent_saemix_dfop_tc$so, method = "lin")
-)
-print(AIC_parent_saemix_methods)

    is     gq    lin 
-665.65 665.68 665.11

The AIC values based on importance sampling and Gaussian quadrature -are very similar. Using linearisation is known to be less accurate, but -still gives a similar value.

In order to illustrate that the comparison of the three method -depends on the degree of convergence obtained in the fit, the same -comparison is shown below for the fit using the defaults for the number -of iterations and the number of MCMC chains.

When using OpenBlas for linear algebra, there is a large difference -in the values obtained with Gaussian quadrature, so the larger number of -iterations makes a lot of difference. When using the LAPACK version -coming with Debian Bullseye, the AIC based on Gaussian quadrature is -almost the same as the one obtained with the other methods, also when -using defaults for the fit.

-f_parent_saemix_dfop_tc_defaults <- mkin::saem(f_parent_mkin_tc["DFOP", ])
-f_parent_saemix_dfop_tc_defaults$so <-
-  saemix::llgq.saemix(f_parent_saemix_dfop_tc_defaults$so)
-AIC_parent_saemix_methods_defaults <- c(
-  is = AIC(f_parent_saemix_dfop_tc_defaults$so, method = "is"),
-  gq = AIC(f_parent_saemix_dfop_tc_defaults$so, method = "gq"),
-  lin = AIC(f_parent_saemix_dfop_tc_defaults$so, method = "lin")
-)
-print(AIC_parent_saemix_methods_defaults)

    is     gq    lin 
-669.77 669.36 670.95

Comparison -

The following table gives the AIC values obtained with both backend -packages using the same control parameters (800 iterations burn-in, 300 -iterations second phase, 15 chains).

-AIC_all <- data.frame(
-  check.names = FALSE,
-  "Degradation model" = c("SFO", "SFO", "DFOP", "DFOP"),
-  "Error model" = c("const", "tc", "const", "tc"),
-  nlme = c(AIC(f_parent_nlme_sfo_const), AIC(f_parent_nlme_sfo_tc), NA, AIC(f_parent_nlme_dfop_tc)),
-  saemix_lin = sapply(list(f_parent_saemix_sfo_const$so, f_parent_saemix_sfo_tc$so,
-    f_parent_saemix_dfop_const$so, f_parent_saemix_dfop_tc$so), AIC, method = "lin"),
-  saemix_is = sapply(list(f_parent_saemix_sfo_const$so, f_parent_saemix_sfo_tc$so,
-    f_parent_saemix_dfop_const$so, f_parent_saemix_dfop_tc$so), AIC, method = "is")
-)
-kable(AIC_all)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Degradation model	Error model	nlme	saemix_lin	saemix_is
SFO	const	796.60	796.60	796.38
SFO	tc	798.60	798.60	798.38
DFOP	const	NA	709.26	705.75
DFOP	tc	671.91	665.11	665.65

Conclusion -

A more detailed analysis of the dimethenamid dataset confirmed that -the DFOP model provides the most appropriate description of the decline -of the parent compound in these data. On the other hand, closer -inspection of the results revealed that the variability of the k2 -parameter across the population of soils is ill-defined. This coincides -with the observation that this parameter cannot robustly be quantified -for some of the soils.

Regarding the regulatory use of these data, it is claimed that an -improved characterisation of the mean parameter values across the -population is obtained using the nonlinear mixed-effects models -presented here. However, attempts to quantify the variability of the -slower rate constant of the biphasic decline of dimethenamid indicate -that the data are not sufficient to characterise this variability to a -satisfactory precision.

Session Info -

-sessionInfo()

R version 4.2.3 (2023-03-15)
-Platform: x86_64-pc-linux-gnu (64-bit)
-Running under: Debian GNU/Linux 12 (bookworm)
-
-Matrix products: default
-BLAS:   /usr/lib/x86_64-linux-gnu/openblas-serial/libblas.so.3
-LAPACK: /usr/lib/x86_64-linux-gnu/openblas-serial/libopenblas-r0.3.21.so
-
-locale:
- [1] LC_CTYPE=de_DE.UTF-8       LC_NUMERIC=C              
- [3] LC_TIME=C                  LC_COLLATE=de_DE.UTF-8    
- [5] LC_MONETARY=de_DE.UTF-8    LC_MESSAGES=de_DE.UTF-8   
- [7] LC_PAPER=de_DE.UTF-8       LC_NAME=C                 
- [9] LC_ADDRESS=C               LC_TELEPHONE=C            
-[11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C       
-
-attached base packages:
-[1] stats     graphics  grDevices utils     datasets  methods   base     
-
-other attached packages:
-[1] saemix_3.2   npde_3.3     nlme_3.1-162 mkin_1.2.3   knitr_1.42  
-
-loaded via a namespace (and not attached):
- [1] highr_0.10        pillar_1.9.0      bslib_0.4.2       compiler_4.2.3   
- [5] jquerylib_0.1.4   tools_4.2.3       mclust_6.0.0      digest_0.6.31    
- [9] tibble_3.2.1      jsonlite_1.8.4    evaluate_0.20     memoise_2.0.1    
-[13] lifecycle_1.0.3   gtable_0.3.3      lattice_0.21-8    pkgconfig_2.0.3  
-[17] rlang_1.1.0       DBI_1.1.3         cli_3.6.1         yaml_2.3.7       
-[21] parallel_4.2.3    pkgdown_2.0.7     xfun_0.38         fastmap_1.1.1    
-[25] gridExtra_2.3     dplyr_1.1.1       stringr_1.5.0     generics_0.1.3   
-[29] desc_1.4.2        fs_1.6.1          vctrs_0.6.1       sass_0.4.5       
-[33] systemfonts_1.0.4 tidyselect_1.2.0  rprojroot_2.0.3   lmtest_0.9-40    
-[37] grid_4.2.3        glue_1.6.2        R6_2.5.1          textshaping_0.3.6
-[41] fansi_1.0.4       rmarkdown_2.21    purrr_1.0.1       ggplot2_3.4.2    
-[45] magrittr_2.0.3    codetools_0.2-19  scales_1.2.1      htmltools_0.5.5  
-[49] colorspace_2.1-0  ragg_1.2.5        utf8_1.2.3        stringi_1.7.12   
-[53] munsell_0.5.0     cachem_1.0.7      zoo_1.8-12

References -

- -

-EFSA. 2018. “Peer Review of the Pesticide Risk Assessment of the -Active Substance Dimethenamid-p.” EFSA Journal 16: 5211. -

-Ranke, Johannes, Janina Wöltjen, Jana Schmidt, and Emmanuelle Comets. -2021. “Taking Kinetic Evaluations of Degradation Data to the Next -Level with Nonlinear Mixed-Effects Models.” Environments -8 (8). https://doi.org/10.3390/environments8080071. -

-Rapporteur Member State Germany, Co-Rapporteur Member State Bulgaria. -2018. “Renewal Assessment Report -Dimethenamid-P Volume 3 - B.8 Environmental fate and behaviour, Rev. 2 - -November 2017.” https://open.efsa.europa.eu/study-inventory/EFSA-Q-2014-00716. -

- - - -

Example evaluations of the dimethenamid data -from 2018

Johannes -Ranke

Last change 1 July 2022, -built on 16 Apr 2023