aboutsummaryrefslogtreecommitdiff
path: root/vignettes/web_only/dimethenamid_2018.rmd
blob: c152e5782214c487236e2ef818e02b58ef3ee475 (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
---
title: Example evaluations of the dimethenamid data from 2018
author: Johannes Ranke
date: Last change 4 August 2021, built on `r format(Sys.Date(), format = "%d %b %Y")`
output:
  html_document:
    toc: true
    toc_float: true
    code_folding: hide
    fig_retina: null
bibliography: ../references.bib
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

[Wissenschaftlicher Berater, Kronacher Str. 12, 79639 Grenzach-Wyhlen, Germany](http://www.jrwb.de)<br />
[Privatdozent at the University of Bremen](http://chem.uft.uni-bremen.de/ranke)

```{r, include = FALSE}
require(knitr)
options(digits = 5)
opts_chunk$set(
  comment = "",
  tidy = FALSE,
  cache = TRUE
)
```

# Introduction

During the preparation of the journal article on nonlinear mixed-effects models in
degradation kinetics (submitted) and the analysis of the dimethenamid degradation
data analysed therein, a need for a more detailed analysis using not only nlme and saemix,
but also nlmixr for fitting the mixed-effects models was identified.

This vignette is an attempt to satisfy this need.

# Data

Residue data forming the basis for the endpoints derived in the conclusion on
the peer review of the pesticide risk assessment of dimethenamid-P published by
the European Food Safety Authority (EFSA) in 2018 [@efsa_2018_dimethenamid]
were transcribed from the risk assessment report [@dimethenamid_rar_2018_b8]
which can be downloaded from the
[EFSA register of questions](https://open.efsa.europa.eu/study-inventory/EFSA-Q-2014-00716).

The data are [available in the mkin
package](https://pkgdown.jrwb.de/mkin/reference/dimethenamid_2018.html). The
following code (hidden by default, please use the button to the right to show
it) treats the data available for the racemic mixture dimethenamid (DMTA) and
its enantiomer dimethenamid-P (DMTAP) in the same way, as no difference between
their degradation behaviour was identified in the EU risk assessment. The
observation times of each dataset are multiplied with the corresponding
normalisation factor also available in the dataset, in order to make it
possible to describe all datasets with a single set of parameters.

Also, datasets observed in the same soil are merged, resulting in dimethenamid
(DMTA) data from six soils.

```{r dimethenamid_data}
library(mkin)
dmta_ds <- lapply(1:8, function(i) {
  ds_i <- dimethenamid_2018$ds[[i]]$data
  ds_i[ds_i$name == "DMTAP", "name"] <-  "DMTA"
  ds_i$time <- ds_i$time * dimethenamid_2018$f_time_norm[i]
  ds_i
})
names(dmta_ds) <- sapply(dimethenamid_2018$ds, function(ds) ds$title)
dmta_ds[["Borstel"]] <- rbind(dmta_ds[["Borstel 1"]], dmta_ds[["Borstel 2"]])
dmta_ds[["Borstel 1"]] <- NULL
dmta_ds[["Borstel 2"]] <- NULL
dmta_ds[["Elliot"]] <- rbind(dmta_ds[["Elliot 1"]], dmta_ds[["Elliot 2"]])
dmta_ds[["Elliot 1"]] <- NULL
dmta_ds[["Elliot 2"]] <- NULL
```

# Parent degradation

We evaluate the observed degradation of the parent compound using simple
exponential decline (SFO) and biexponential decline (DFOP), using constant
variance (const) and a two-component variance (tc) as error models.

## Separate evaluations

As a first step, to get a visual impression of the fit of the different models,
we do separate evaluations for each soil using the mmkin function from the
mkin package:

```{r f_parent_mkin}
f_parent_mkin_const <- mmkin(c("SFO", "DFOP"), dmta_ds,
  error_model = "const", quiet = TRUE)
f_parent_mkin_tc <- mmkin(c("SFO", "DFOP"), dmta_ds,
  error_model = "tc", quiet = TRUE)
```

The plot of the individual SFO fits shown below suggests that at least in some
datasets the degradation slows down towards later time points, and that the
scatter of the residuals error is smaller for smaller values (panel to the
right):

```{r f_parent_mkin_sfo_const}
plot(mixed(f_parent_mkin_const["SFO", ]))
```

Using biexponential decline (DFOP) results in a slightly more random
scatter of the residuals:

```{r f_parent_mkin_dfop_const}
plot(mixed(f_parent_mkin_const["DFOP", ]))
```

The population curve (bold line) in the above plot results from taking the mean
of the individual transformed parameters, i.e. of log k1 and log k2, as well as
of the logit of the g parameter of the DFOP model). Here, this procedure
does not result in parameters that represent the degradation well, because in some
datasets the fitted value for k2 is extremely close to zero, leading to a log
k2 value that dominates the average. This is alleviated if only rate constants
that pass the t-test for significant difference from zero (on the untransformed
scale) are considered in the averaging:

```{r f_parent_mkin_dfop_const_test}
plot(mixed(f_parent_mkin_const["DFOP", ]), test_log_parms = TRUE)
```

While this is visually much more satisfactory, such an average procedure could
introduce a bias, as not all results from the individual fits enter the
population curve with the same weight. This is where nonlinear mixed-effects
models can help out by treating all datasets with equally by fitting a
parameter distribution model together with the degradation model and the error
model (see below).

The remaining trend of the residuals to be higher for higher predicted residues
is reduced by using the two-component error model:

```{r f_parent_mkin_dfop_tc_test}
plot(mixed(f_parent_mkin_tc["DFOP", ]), test_log_parms = TRUE)
```

## Nonlinear mixed-effects models

Instead of taking a model selection decision for each of the individual fits, we fit
nonlinear mixed-effects models (using different fitting algorithms as implemented in
different packages) and do model selection using all available data at the same time.
In order to make sure that these decisions are not unduly influenced by the
type of algorithm used, by implementation details or by the use of wrong control
parameters, we compare the model selection results obtained with different R
packages, with different algorithms and checking control parameters.

### nlme

The nlme package was the first R extension providing facilities to fit nonlinear
mixed-effects models. We use would like to do model selection from all four
combinations of degradation models and error models based on the AIC.
However, fitting the DFOP model with constant variance and using default
control parameters results in an error, signalling that the maximum number
of 50 iterations was reached, potentially indicating overparameterisation.
However, the algorithm converges when the two-component error model is
used in combination with the DFOP model. This can be explained by the fact
that the smaller residues observed at later sampling times get more
weight when using the two-component error model which will counteract the
tendency of the algorithm to try parameter combinations unsuitable for
fitting these data.

```{r f_parent_nlme, warning = FALSE}
library(nlme)
f_parent_nlme_sfo_const <- nlme(f_parent_mkin_const["SFO", ])
#f_parent_nlme_dfop_const <- nlme(f_parent_mkin_const["DFOP", ])
# maxIter = 50 reached
f_parent_nlme_sfo_tc <- nlme(f_parent_mkin_tc["SFO", ])
f_parent_nlme_dfop_tc <- nlme(f_parent_mkin_tc["DFOP", ])
```

Note that overparameterisation is also indicated by warnings obtained when
fitting SFO or DFOP with the two-component error model ('false convergence' in
the 'LME step' in some iterations). In addition to these fits, attempts
were also made to include correlations between random effects by using the
log Cholesky parameterisation of the matrix specifying them. The code
used for these attempts can be made visible below.

```{r f_parent_nlme_logchol, warning = FALSE, eval = FALSE}
f_parent_nlme_sfo_const_logchol <- nlme(f_parent_mkin_const["SFO", ],
  random = pdLogChol(list(DMTA_0 ~ 1, log_k_DMTA ~ 1)))
anova(f_parent_nlme_sfo_const, f_parent_nlme_sfo_const_logchol) # not better
#f_parent_nlme_dfop_tc_logchol <- update(f_parent_nlme_dfop_tc,
#  random = pdLogChol(list(DMTA_0 ~ 1, log_k1 ~ 1, log_k2 ~ 1, g_qlogis ~ 1)))
# using log Cholesky parameterisation for random effects (nlme default) does
# not converge here and gives lots of warnings about the LME step not converging
```

The model comparison function of the nlme package can directly be applied
to these fits showing a similar goodness-of-fit of the SFO model, but a much
lower AIC for the DFOP model fitted with the two-component error model.
Also, the likelihood ratio test indicates that this difference is significant.
as the p-value is below 0.0001.

```{r AIC_parent_nlme}
anova(
  f_parent_nlme_sfo_const, f_parent_nlme_sfo_tc, f_parent_nlme_dfop_tc
)
```

The selected model (DFOP with two-component error) fitted to the data assuming
no correlations between random effects is shown below.

```{r plot_parent_nlme}
plot(f_parent_nlme_dfop_tc)
```

### saemix

The saemix package provided the first Open Source implementation of the
Stochastic Approximation to the Expectation Maximisation (SAEM) algorithm.
SAEM fits of degradation models can be performed using an interface to the
saemix package available in current development versions of the mkin package.

The corresponding SAEM fits of the four combinations of degradation and error
models are fitted below. As there is no convergence criterion implemented in
the saemix package, the convergence plots need to be manually checked for every
fit.

The convergence plot for the SFO model using constant variance is shown below.

```{r f_parent_saemix_sfo_const, results = 'hide'}
library(saemix)
f_parent_saemix_sfo_const <- mkin::saem(f_parent_mkin_const["SFO", ], quiet = TRUE,
  transformations = "saemix")
plot(f_parent_saemix_sfo_const$so, plot.type = "convergence")
```

Obviously the default number of iterations is sufficient to reach convergence.
This can also be said for the SFO fit using the two-component error model.

```{r f_parent_saemix_sfo_tc, results = 'hide'}
f_parent_saemix_sfo_tc <- mkin::saem(f_parent_mkin_tc["SFO", ], quiet = TRUE,
  transformations = "saemix")
plot(f_parent_saemix_sfo_tc$so, plot.type = "convergence")
```

When fitting the DFOP model with constant variance, parameter convergence
is not as unambiguous (see the failure of nlme with the default number of
iterations above). Therefore, the number of iterations in the first
phase of the algorithm was increased, leading to visually satisfying
convergence.

```{r f_parent_saemix_dfop_const, results = 'hide'}
f_parent_saemix_dfop_const <- mkin::saem(f_parent_mkin_const["DFOP", ], quiet = TRUE,
  control = saemixControl(nbiter.saemix = c(800, 200), print = FALSE,
    save = FALSE, save.graphs = FALSE, displayProgress = FALSE),
  transformations = "saemix")
plot(f_parent_saemix_dfop_const$so, plot.type = "convergence")
```

The same applies to the case where the DFOP model is fitted with the
two-component error model. Convergence of the variance of k2 is enhanced
by using the two-component error, it remains more or less stable already after 200
iterations of the first phase.

```{r f_parent_saemix_dfop_tc_moreiter, results = 'hide'}
f_parent_saemix_dfop_tc_moreiter <- mkin::saem(f_parent_mkin_tc["DFOP", ], quiet = TRUE,
  control = saemixControl(nbiter.saemix = c(800, 200), print = FALSE,
    save = FALSE, save.graphs = FALSE, displayProgress = FALSE),
  transformations = "saemix")
plot(f_parent_saemix_dfop_tc_moreiter$so, plot.type = "convergence")
```

The four combinations can be compared using the model comparison function from the
saemix package:

```{r AIC_parent_saemix}
compare.saemix(f_parent_saemix_sfo_const$so, f_parent_saemix_sfo_tc$so,
  f_parent_saemix_dfop_const$so, f_parent_saemix_dfop_tc_moreiter$so)
```

As in the case of nlme fits, the DFOP model fitted with two-component error
(number 4) gives the lowest AIC. The numeric values are reasonably close to
the ones obtained using nlme, considering that the algorithms for fitting the
model and for the likelihood calculation are quite different.

In order to check the influence of the likelihood calculation algorithms
implemented in saemix, the likelihood from Gaussian quadrature is added
to the best fit, and the AIC values obtained from the three methods
are compared.

```{r AIC_parent_saemix_methods}
f_parent_saemix_dfop_tc_moreiter$so <-
  llgq.saemix(f_parent_saemix_dfop_tc_moreiter$so)
AIC(f_parent_saemix_dfop_tc_moreiter$so)
AIC(f_parent_saemix_dfop_tc_moreiter$so, method = "gq")
AIC(f_parent_saemix_dfop_tc_moreiter$so, method = "lin")
```

The AIC values based on importance sampling and Gaussian quadrature are quite
similar. Using linearisation is less accurate, but still gives a similar value.


### nlmixr

In the last years, a lot of effort has been put into the nlmixr package which
is designed for pharmacokinetics, where nonlinear mixed-effects models are
routinely used, but which can also be used for related data like chemical
degradation data. A current development branch of the mkin package provides
an interface between mkin and nlmixr. Here, we check if we get equivalent
results when using a refined version of the First Order Conditional Estimation
(FOCE) algorithm used in nlme, namely First Order Conditional Estimation with
Interaction (FOCEI), and the SAEM algorithm as implemented in nlmixr.

First, the focei algorithm is used for the four model combinations and the
goodness of fit of the results is compared.

```{r f_parent_nlmixr_focei, results = "hide", message = FALSE, warning = FALSE}
library(nlmixr)
f_parent_nlmixr_focei_sfo_const <- nlmixr(f_parent_mkin_const["SFO", ], est = "focei")
f_parent_nlmixr_focei_sfo_tc <- nlmixr(f_parent_mkin_tc["SFO", ], est = "focei")
f_parent_nlmixr_focei_dfop_const <- nlmixr(f_parent_mkin_const["DFOP", ], est = "focei")
f_parent_nlmixr_focei_dfop_tc<- nlmixr(f_parent_mkin_tc["DFOP", ], est = "focei")
```

```{r AIC_parent_nlmixr_focei}
AIC(f_parent_nlmixr_focei_sfo_const$nm, f_parent_nlmixr_focei_sfo_tc$nm,
  f_parent_nlmixr_focei_dfop_const$nm, f_parent_nlmixr_focei_dfop_tc$nm)
```

The AIC values are very close to the ones obtained with nlme which are repeated below
for convenience.

```{r AIC_parent_nlme_rep}
AIC(
  f_parent_nlme_sfo_const, f_parent_nlme_sfo_tc, f_parent_nlme_dfop_tc
)
```

Secondly, we use the SAEM estimation routine and check the convergence plots for
SFO with constant variance

```{r f_parent_nlmixr_saem_sfo_const, results = "hide", warning = FALSE, message = FALSE}
f_parent_nlmixr_saem_sfo_const <- nlmixr(f_parent_mkin_const["SFO", ], est = "saem",
  control = nlmixr::saemControl(logLik = TRUE))
traceplot(f_parent_nlmixr_saem_sfo_const$nm)
```

for SFO with two-component error

```{r f_parent_nlmixr_saem_sfo_tc, results = "hide", warning = FALSE, message = FALSE}
f_parent_nlmixr_saem_sfo_tc <- nlmixr(f_parent_mkin_tc["SFO", ], est = "saem",
  control = nlmixr::saemControl(logLik = TRUE))
traceplot(f_parent_nlmixr_saem_sfo_tc$nm)
```

For DFOP with constant variance, the convergence plots show considerable instability
of the fit, which can be alleviated by increasing the number of iterations and
the number of parallel chains for the first phase of algorithm.

```{r f_parent_nlmixr_saem_dfop_const, results = "hide", warning = FALSE, message = FALSE}
f_parent_nlmixr_saem_dfop_const <- nlmixr(f_parent_mkin_const["DFOP", ], est = "saem",
  control = nlmixr::saemControl(logLik = TRUE, nBurn = 1000), nmc = 15)
traceplot(f_parent_nlmixr_saem_dfop_const$nm)
```

For DFOP with two-component error, the same increase in iterations and parallel
chains was used, but using the two-component error appears to lead to a less
erratic convergence, so this may not be necessary to this degree.


```{r f_parent_nlmixr_saem_dfop_tc, results = "hide", warning = FALSE, message = FALSE}
f_parent_nlmixr_saem_dfop_tc <- nlmixr(f_parent_mkin_tc["DFOP", ], est = "saem",
  control = nlmixr::saemControl(logLik = TRUE, nBurn = 1000, nmc = 15))
traceplot(f_parent_nlmixr_saem_dfop_tc$nm)
```

The AIC values are internally calculated using Gaussian quadrature. For an
unknown reason, the AIC value obtained for the DFOP fit using the two-component
error model is given as Infinity.

```{r AIC_parent_nlmixr_saem}
AIC(f_parent_nlmixr_saem_sfo_const$nm, f_parent_nlmixr_saem_sfo_tc$nm,
  f_parent_nlmixr_saem_dfop_const$nm, f_parent_nlmixr_saem_dfop_tc$nm)
```

The following table gives the AIC values obtained with the three packages.

```{r AIC_all}
AIC_all <- data.frame(
  "Degradation model" = c("SFO", "SFO", "DFOP", "DFOP"),
  "Error model" = c("const", "tc", "const", "tc"),
  nlme = c(AIC(f_parent_nlme_sfo_const), AIC(f_parent_nlme_sfo_tc), NA, AIC(f_parent_nlme_dfop_tc)),
  nlmixr_focei = sapply(list(f_parent_nlmixr_focei_sfo_const$nm, f_parent_nlmixr_focei_sfo_tc$nm,
  f_parent_nlmixr_focei_dfop_const$nm, f_parent_nlmixr_focei_dfop_tc$nm), AIC),
  saemix = sapply(list(f_parent_saemix_sfo_const$so, f_parent_saemix_sfo_tc$so,
    f_parent_saemix_dfop_const$so, f_parent_saemix_dfop_tc_moreiter$so), AIC),
  nlmixr_saem = sapply(list(f_parent_nlmixr_saem_sfo_const$nm, f_parent_nlmixr_saem_sfo_tc$nm,
  f_parent_nlmixr_saem_dfop_const$nm, f_parent_nlmixr_saem_dfop_tc$nm), AIC)
)
kable(AIC_all)
```

# References

<!-- vim: set foldmethod=syntax: -->

Contact - Imprint