vignettes/prebuilt/2023_mesotrione_parent.rmd


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596

---
title: "Testing covariate modelling in hierarchical parent degradation kinetics with residue data on mesotrione"
author: Johannes Ranke
date: Last change on 4 August 2023, last compiled on `r format(Sys.time(), "%e
  %B %Y")`
output:
  pdf_document:
    extra_dependencies: ["float", "listing"]
toc: yes
geometry: margin=2cm
---

```{r setup, echo = FALSE, cache = FALSE}
options(width = 80) # For summary listings
knitr::opts_chunk$set(
  cache = TRUE,
  comment = "", tidy = FALSE,
  fig.pos = "H", fig.align = "center"
)
```

\clearpage

# Introduction

The purpose of this document is to test demonstrate how nonlinear hierarchical
models (NLHM) based on the parent degradation models SFO, FOMC, DFOP and HS
can be fitted with the mkin package, also considering the influence of
covariates like soil pH on different degradation parameters. Because in some
other case studies, the SFORB parameterisation of biexponential decline has
shown some advantages over the DFOP parameterisation, SFORB was included
in the list of tested models as well.

The mkin package is used in version `r packageVersion("mkin")`, which is
contains the functions that were used for the evaluations. The `saemix` package
is used as a backend for fitting the NLHM, but is also loaded to make the
convergence plot function available.

This document is processed with the `knitr` package, which also provides the
`kable` function that is used to improve the display of tabular data in R
markdown documents. For parallel processing, the `parallel` package is used.

```{r packages, cache = FALSE, message = FALSE}
library(mkin)
library(knitr)
library(saemix)
library(parallel)
n_cores <- detectCores()
if (Sys.info()["sysname"] == "Windows") {
  cl <- makePSOCKcluster(n_cores)
} else {
  cl <- makeForkCluster(n_cores)
}
```

\clearpage

## Test data

```{r data}
data_file <- system.file(
  "testdata", "mesotrione_soil_efsa_2016.xlsx", package = "mkin")
meso_ds <- read_spreadsheet(data_file, parent_only = TRUE)
```

The following tables show the covariate data and the `r length(meso_ds)`
datasets that were read in from the spreadsheet file.

```{r show-covar-data, dependson = "data", results = "asis"}
pH <- attr(meso_ds, "covariates")
kable(pH, caption = "Covariate data")
```

\clearpage

```{r show-data, dependson = "data", results = "asis"}
for (ds_name in names(meso_ds)) {
  print(
    kable(mkin_long_to_wide(meso_ds[[ds_name]]),
      caption = paste("Dataset", ds_name),
      booktabs = TRUE, row.names = FALSE))
}
```

\clearpage

# Separate evaluations

In order to obtain suitable starting parameters for the NLHM fits,
separate fits of the five models to the data for each
soil are generated using the `mmkin` function from the mkin package.
In a first step, constant variance is assumed. Convergence
is checked with the `status` function.

```{r f-sep-const, dependson = "data"}
deg_mods <- c("SFO", "FOMC", "DFOP", "SFORB", "HS")
f_sep_const <- mmkin(
  deg_mods,
  meso_ds,
  error_model = "const",
  cluster = cl,
  quiet = TRUE)
```

```{r dependson = "f-sep-const"}
status(f_sep_const[, 1:5]) |> kable()
status(f_sep_const[, 6:18]) |> kable()
```

In the tables above, OK indicates convergence and C indicates failure to
converge. Most separate fits with constant variance converged, with the
exception of two FOMC fits, one SFORB fit and one HS fit.

```{r f-sep-tc, dependson = "f-sep-const"}
f_sep_tc <- update(f_sep_const, error_model = "tc")
```

```{r dependson = "f-sep-tc"}
status(f_sep_tc[, 1:5]) |> kable()
status(f_sep_tc[, 6:18]) |> kable()
```

With the two-component error model, the set of fits that did not converge
is larger, with convergence problems appearing for a number of non-SFO fits.

\clearpage

# Hierarchical model fits without covariate effect

The following code fits hierarchical kinetic models for the ten combinations of
the five different degradation models with the two different error models in
parallel.

```{r f-saem-1, dependson = c("f-sep-const", "f-sep-tc")}
f_saem_1 <- mhmkin(list(f_sep_const, f_sep_tc), cluster = cl)
status(f_saem_1) |> kable()
```

All fits terminate without errors (status OK).

```{r dependson = "f-saem-1"}
anova(f_saem_1) |> kable(digits = 1)
```
The model comparisons show that the fits with constant variance are consistently
preferable to the corresponding fits with two-component error for these data.
This is confirmed by the fact that the parameter `b.1` (the relative standard
deviation in the fits obtained with the saemix package), is ill-defined in all
fits.

```{r dependson = "f-saem-1"}
illparms(f_saem_1) |> kable()
```

For obtaining fits with only well-defined random effects, we update
the set of fits, excluding random effects that were ill-defined
according to the `illparms` function.

```{r f-saem-2, dependson = "f-saem-1"}
f_saem_2 <- update(f_saem_1, no_random_effect = illparms(f_saem_1))
status(f_saem_2) |> kable()
```

The updated fits terminate without errors.

```{r dependson = "f-saem-2"}
illparms(f_saem_2) |> kable()
```

No ill-defined errors remain in the fits with constant variance.

\clearpage

# Hierarchical model fits with covariate effect

In the following sections, hierarchical fits including a model for the
influence of pH on selected degradation parameters are shown for all parent
models. Constant variance is selected as the error model based on the fits
without covariate effects. Random effects that were ill-defined in the fits
without pH influence are excluded. A potential influence of the soil pH is only
included for parameters with a well-defined random effect, because experience
has shown that for such parameters no significant pH effect could be found.

## SFO

```{r sfo-pH, dependson = "f-sep-const"}
sfo_pH <- saem(f_sep_const["SFO", ], no_random_effect = "meso_0", covariates = pH,
  covariate_models = list(log_k_meso ~ pH))
```

```{r dependson = "sfo-pH"}
summary(sfo_pH)$confint_trans |> kable(digits = 2)
```

The parameter showing the pH influence in the above table is
`beta_pH(log_k_meso)`. Its confidence interval does not include zero,
indicating that the influence of soil pH on the log of the degradation rate
constant is significantly greater than zero.


```{r dependson = "sfo-pH"}
anova(f_saem_2[["SFO", "const"]], sfo_pH, test = TRUE)
```

The comparison with the SFO fit without covariate effect confirms that
considering the soil pH improves the model, both by comparison of AIC and BIC
and by the likelihood ratio test.

\clearpage

```{r dependson = "sfo-pH", plot.height = 5}
plot(sfo_pH)
```

Endpoints for a model with covariates are by default calculated for
the median of the covariate values. This quantile can be adapted,
or a specific covariate value can be given as shown below.

```{r dependson = "sfo-pH", plot.height = 5}
endpoints(sfo_pH)
endpoints(sfo_pH, covariate_quantile = 0.9)
endpoints(sfo_pH, covariates = c(pH = 7.0))
```

\clearpage

## FOMC

```{r fomc-pH, dependson = "f-sep-const"}
fomc_pH <- saem(f_sep_const["FOMC", ], no_random_effect = "meso_0", covariates = pH,
  covariate_models = list(log_alpha ~ pH))
```

```{r dependson = "fomc-pH"}
summary(fomc_pH)$confint_trans |> kable(digits = 2)
```

As in the case of SFO, the confidence interval of the slope parameter
(here `beta_pH(log_alpha)`) quantifying the influence of soil pH
does not include zero, and the model comparison clearly indicates
that the model with covariate influence is preferable.
However, the random effect for `alpha` is not well-defined any
more after inclusion of the covariate effect (the confidence
interval of `SD.log_alpha` includes zero).

```{r dependson = "fomc-pH"}
illparms(fomc_pH)
```

Therefore, the model is updated without this random effect, and
no ill-defined parameters remain.

```{r fomc-pH-2, dependson = "fomc_pH"}
fomc_pH_2 <- update(fomc_pH, no_random_effect = c("meso_0", "log_alpha"))
illparms(fomc_pH_2)
```

```{r dependson = "fomc-pH-2"}
anova(f_saem_2[["FOMC", "const"]], fomc_pH, fomc_pH_2, test = TRUE)
```

Model comparison indicates that including pH dependence significantly improves
the fit, and that the reduced model with covariate influence results in
the most preferable FOMC fit.

```{r dependson = "fomc-pH"}
summary(fomc_pH_2)$confint_trans |> kable(digits = 2)
```

\clearpage

```{r dependson = "fomc-pH", plot.height = 5}
plot(fomc_pH_2)
```

```{r dependson = "fomc-pH", plot.height = 5}
endpoints(fomc_pH_2)
endpoints(fomc_pH_2, covariates = c(pH = 7))
```

\clearpage

## DFOP

In the DFOP fits without covariate effects, random effects for two degradation
parameters (`k2` and `g`) were identifiable.

```{r dependson = "f-saem-2"}
summary(f_saem_2[["DFOP", "const"]])$confint_trans |> kable(digits = 2)
```

A fit with pH dependent degradation parameters was obtained by excluding
the same random effects as in the refined DFOP fit without covariate influence,
and including covariate models for the two identifiable parameters `k2` and `g`.

```{r dfop-pH, dependson = "f-sep-const"}
dfop_pH <- saem(f_sep_const["DFOP", ], no_random_effect = c("meso_0", "log_k1"),
  covariates = pH,
  covariate_models = list(log_k2 ~ pH, g_qlogis ~ pH))
```

The corresponding parameters for
the influence of soil pH are `beta_pH(log_k2)` for the influence of soil pH on
`k2`, and `beta_pH(g_qlogis)` for its influence on `g`.

```{r dependson = "dfop-pH"}
summary(dfop_pH)$confint_trans |> kable(digits = 2)
illparms(dfop_pH)
```

Confidence intervals for neither of them include zero, indicating a significant
difference from zero. However, the random effect for `g` is now ill-defined.
The fit is updated without this ill-defined random effect.

```{r dfop-pH-2, dependson = "dfop-pH"}
dfop_pH_2 <- update(dfop_pH,
  no_random_effect = c("meso_0", "log_k1", "g_qlogis"))
illparms(dfop_pH_2)
```

Now, the slope parameter for the pH effect on `g` is ill-defined.
Therefore, another attempt is made without the corresponding covariate model.

```{r dfop-pH-3, dependson = "f-sep-const"}
dfop_pH_3 <- saem(f_sep_const["DFOP", ], no_random_effect = c("meso_0", "log_k1"),
  covariates = pH,
  covariate_models = list(log_k2 ~ pH))
illparms(dfop_pH_3)
```
As the random effect for `g` is again ill-defined, the fit is repeated without it.

```{r dfop-pH-4, dependson = "dfop-pH-3"}
dfop_pH_4 <- update(dfop_pH_3, no_random_effect = c("meso_0", "log_k1", "g_qlogis"))
illparms(dfop_pH_4)
```

While no ill-defined parameters remain, model comparison suggests that the previous
model `dfop_pH_2` with two pH dependent parameters is preferable, based on
information criteria as well as based on the likelihood ratio test.

```{r dependson = "dfop-pH-4"}
anova(f_saem_2[["DFOP", "const"]], dfop_pH, dfop_pH_2, dfop_pH_3, dfop_pH_4)
anova(dfop_pH_2, dfop_pH_4, test = TRUE)
```

When focussing on parameter identifiability using the test if the confidence
interval includes zero, `dfop_pH_4` would still be the preferred model.
However, it should be kept in mind that parameter confidence intervals are
constructed using a simple linearisation of the likelihood. As the confidence
interval of the random effect for `g` only marginally includes zero,
it is suggested that this is acceptable, and that `dfop_pH_2` can be considered
the most preferable model.

\clearpage

```{r dependson = "dfop-pH-2", plot.height = 5}
plot(dfop_pH_2)
```

```{r dependson = "dfop-pH-2", plot.height = 5}
endpoints(dfop_pH_2)
endpoints(dfop_pH_2, covariates = c(pH = 7))
```

\clearpage

## SFORB

```{r sforb-pH, dependson = "f-sep-const"}
sforb_pH <- saem(f_sep_const["SFORB", ], no_random_effect = c("meso_free_0", "log_k_meso_free_bound"),
  covariates = pH,
  covariate_models = list(log_k_meso_free ~ pH, log_k_meso_bound_free ~ pH))
```

```{r dependson = "sforb-pH"}
summary(sforb_pH)$confint_trans |> kable(digits = 2)
```

The confidence interval of `beta_pH(log_k_meso_bound_free)` includes zero,
indicating that the influence of soil pH on `k_meso_bound_free` cannot reliably
be quantified. Also, the confidence interval for the random effect on this
parameter (`SD.log_k_meso_bound_free`) includes zero.

Using the `illparms` function, these ill-defined parameters can be found
more conveniently.

```{r dependson = "sforb-pH"}
illparms(sforb_pH)
```

To remove the ill-defined parameters, a second variant of the SFORB model
with pH influence is fitted. No ill-defined parameters remain.

```{r sforb-pH-2, dependson = "f-sforb-pH"}
sforb_pH_2 <- update(sforb_pH,
  no_random_effect = c("meso_free_0", "log_k_meso_free_bound", "log_k_meso_bound_free"),
  covariate_models = list(log_k_meso_free ~ pH))
illparms(sforb_pH_2)
```

The model comparison of the SFORB fits includes the refined model without
covariate effect, and both versions of the SFORB fit with covariate effect.

```{r dependson = "sforb-pH-2"}
anova(f_saem_2[["SFORB", "const"]], sforb_pH, sforb_pH_2, test = TRUE)
```
The first model including pH influence is preferable based on information criteria
and the likelihood ratio test. However, as it is not fully identifiable,
the second model is selected.

```{r dependson = "sforb-pH-2"}
summary(sforb_pH_2)$confint_trans |> kable(digits = 2)
```
\clearpage

```{r dependson = "sforb-pH-2", plot.height = 5}
plot(sforb_pH_2)
```

```{r dependson = "sforb-pH-2", plot.height = 5}
endpoints(sforb_pH_2)
endpoints(sforb_pH_2, covariates = c(pH = 7))
```

\clearpage

## HS

```{r hs-pH, dependson = "f-sep-const"}
hs_pH <- saem(f_sep_const["HS", ], no_random_effect = c("meso_0"),
  covariates = pH,
  covariate_models = list(log_k1 ~ pH, log_k2 ~ pH, log_tb ~ pH))
```

```{r dependson = "hs-pH"}
summary(hs_pH)$confint_trans |> kable(digits = 2)
illparms(hs_pH)
```

According to the output of the `illparms` function, the random effect on
the break time `tb` cannot reliably be quantified, neither can the influence of
soil pH on `tb`. The fit is repeated without the corresponding covariate
model, and no ill-defined parameters remain.

```{r hs-pH-2, dependson = "hs-pH"}
hs_pH_2 <- update(hs_pH, covariate_models = list(log_k1 ~ pH, log_k2 ~ pH))
illparms(hs_pH_2)
```

Model comparison confirms that this model is preferable to the fit without
covariate influence, and also to the first version with covariate influence.

```{r dependson = c("hs-pH-2", "hs-pH-3")}
anova(f_saem_2[["HS", "const"]], hs_pH, hs_pH_2, test = TRUE)
```

```{r dependson = "hs-pH-2"}
summary(hs_pH_2)$confint_trans |> kable(digits = 2)
```

\clearpage

```{r dependson = "hs-pH-2", plot.height = 5}
plot(hs_pH_2)
```

```{r dependson = "hs-pH-2", plot.height = 5}
endpoints(hs_pH_2)
endpoints(hs_pH_2, covariates = c(pH = 7))
```

\clearpage

## Comparison across parent models

After model reduction for all models with pH influence, they are compared with
each other.

```{r, dependson = c("sfo-pH-2", "fomc-pH-2", "dfop-pH-4", "sforb-pH-1", "hs-pH-3")}
anova(sfo_pH, fomc_pH_2, dfop_pH_2, dfop_pH_4, sforb_pH_2, hs_pH_2)
```

The DFOP model with pH influence on `k2` and `g` and a random effect only on
`k2` is finally selected as the best fit.

The endpoints resulting from this model are listed below. Please refer to the
Appendix for a detailed listing.

```{r, dependson = "dfop-pH-2"}
endpoints(dfop_pH_2)
endpoints(dfop_pH_2, covariates = c(pH = 7))
```

# Conclusions

These evaluations demonstrate that covariate effects can be included
for all types of parent degradation models. These models can then
be further refined to make them fully identifiable.

\clearpage

# Appendix

## Hierarchical fit listings

### Fits without covariate effects

```{r, cache = FALSE, results = "asis", echo = FALSE}
errmods <- c(const = "constant variance", tc = "two-component error")
for (deg_mod in deg_mods) {
  for (err_mod in c("const")) {
    fit <- f_saem_1[[deg_mod, err_mod]]
    if (!inherits(fit$so, "try-error")) {
      caption <- paste("Hierarchical", deg_mod, "fit with", errmods[err_mod])
      tex_listing(fit, caption)
    }
  }
}
```

### Fits with covariate effects

```{r listing-sfo, cache = FALSE, results = "asis", echo = FALSE}
tex_listing(sfo_pH, "Hierarchichal SFO fit with pH influence")
```

\clearpage

```{r, cache = FALSE, results = "asis", echo = FALSE}
tex_listing(fomc_pH, "Hierarchichal FOMC fit with pH influence")
```

\clearpage

```{r, cache = FALSE, results = "asis", echo = FALSE}
tex_listing(fomc_pH_2, "Refined hierarchichal FOMC fit with pH influence")
```

\clearpage

```{r, cache = FALSE, results = "asis", echo = FALSE}
tex_listing(dfop_pH, "Hierarchichal DFOP fit with pH influence")
```

\clearpage

```{r, cache = FALSE, results = "asis", echo = FALSE}
tex_listing(dfop_pH_2, "Refined hierarchical DFOP fit with pH influence")
```

\clearpage

```{r, cache = FALSE, results = "asis", echo = FALSE}
tex_listing(dfop_pH_4, "Further refined hierarchical DFOP fit with pH influence")
```

\clearpage

```{r, cache = FALSE, results = "asis", echo = FALSE}
tex_listing(sforb_pH, "Hierarchichal SFORB fit with pH influence")
```
\clearpage

```{r, cache = FALSE, results = "asis", echo = FALSE}
tex_listing(sforb_pH_2, "Refined hierarchichal SFORB fit with pH influence")
```
\clearpage

```{r, cache = FALSE, results = "asis", echo = FALSE}
tex_listing(hs_pH, "Hierarchichal HS fit with pH influence")
```
\clearpage

```{r, cache = FALSE, results = "asis", echo = FALSE}
tex_listing(hs_pH_2, "Refined hierarchichal HS fit with pH influence")
```

\clearpage

## Session info

```{r, echo = FALSE, cache = FALSE}
parallel::stopCluster(cl = cl)
sessionInfo()
```


## Hardware info

```{r, echo = FALSE}
if(!inherits(try(cpuinfo <- readLines("/proc/cpuinfo")), "try-error")) {
  cat(gsub("model name\t: ", "CPU model: ", cpuinfo[grep("model name", cpuinfo)[1]]))
}
if(!inherits(try(meminfo <- readLines("/proc/meminfo")), "try-error")) {
  cat(gsub("model name\t: ", "System memory: ", meminfo[grep("MemTotal", meminfo)[1]]))
}
```