vignettes/FOCUS_L.Rmd


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266

---
title: "Example evaluation of FOCUS Laboratory Data L1 to L3"
author: "Johannes Ranke"
date: "`r Sys.Date()`"
output:
  html_document:
    toc: true
    toc_float: true
    mathjax: null
references:
- id: ranke2014
  title: <span class="nocase">Prüfung und Validierung von Modellierungssoftware als Alternative zu
    ModelMaker 4.0</span>
  author:
  - family: Ranke
    given: Johannes
  type: report
  year: 2014
  number: Umweltbundesamt Projektnummer 27452
vignette: >
  %\VignetteIndexEntry{Example evaluation of FOCUS Laboratory Data L1 to L3}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

```{r, include = FALSE}
library(knitr)
opts_chunk$set(tidy = FALSE, cache = TRUE)
```

# Laboratory Data L1

The following code defines example dataset L1 from the FOCUS kinetics
report, p. 284:

```{r}
library("mkin", quietly = TRUE)
FOCUS_2006_L1 = data.frame(
  t = rep(c(0, 1, 2, 3, 5, 7, 14, 21, 30), each = 2),
  parent = c(88.3, 91.4, 85.6, 84.5, 78.9, 77.6, 
             72.0, 71.9, 50.3, 59.4, 47.0, 45.1,
             27.7, 27.3, 10.0, 10.4, 2.9, 4.0))
FOCUS_2006_L1_mkin <- mkin_wide_to_long(FOCUS_2006_L1)
```

Here we use the assumptions of simple first order (SFO), the case of declining
rate constant over time (FOMC) and the case of two different phases of the
kinetics (DFOP). For a more detailed discussion of the models, please see the
FOCUS kinetics report.

Since mkin version 0.9-32 (July 2014), we can use shorthand notation like `"SFO"`
for parent only degradation models. The following two lines fit the model and
produce the summary report of the model fit. This covers the numerical analysis
given in the FOCUS report. 

```{r}
m.L1.SFO <- mkinfit("SFO", FOCUS_2006_L1_mkin, quiet = TRUE)
summary(m.L1.SFO)
```

A plot of the fit is obtained with the plot function for mkinfit objects.

```{r fig.width = 6, fig.height = 5}
plot(m.L1.SFO, show_errmin = TRUE, main = "FOCUS L1 - SFO")
```

The residual plot can be easily obtained by

```{r fig.width = 6, fig.height = 5}
mkinresplot(m.L1.SFO, ylab = "Observed", xlab = "Time")
```

For comparison, the FOMC model is fitted as well, and the $\chi^2$ error level
is checked.

```{r fig.width = 6, fig.height = 5}
m.L1.FOMC <- mkinfit("FOMC", FOCUS_2006_L1_mkin, quiet=TRUE)
plot(m.L1.FOMC, show_errmin = TRUE, main = "FOCUS L1 - FOMC")
summary(m.L1.FOMC, data = FALSE)
```

We get a warning that the default optimisation algorithm `Port` did not converge, which
is an indication that the model is overparameterised, *i.e.* contains too many 
parameters that are ill-defined as a consequence.

And in fact, due to the higher number of parameters, and the lower number of
degrees of freedom of the fit, the $\chi^2$ error level is actually higher for
the FOMC model (3.6%) than for the SFO model (3.4%). Additionally, the
parameters `log_alpha` and `log_beta` internally fitted in the model have
excessive confidence intervals, that span more than 25 orders of magnitude (!)
when backtransformed to the scale of `alpha` and `beta`. Also, the t-test
for significant difference from zero does not indicate such a significant difference,
with p-values greater than 0.1, and finally, the parameter correlation of `log_alpha`
and `log_beta` is 1.000, clearly indicating that the model is overparameterised. 

The $\chi^2$ error levels reported in Appendix 3 and Appendix 7 to the FOCUS
kinetics report are rounded to integer percentages and partly deviate by one
percentage point from the results calculated by mkin. The reason for
this is not known. However, mkin gives the same $\chi^2$ error levels
as the kinfit package and the calculation routines of the kinfit package have
been extensively compared to the results obtained by the KinGUI
software, as documented in the kinfit package vignette. KinGUI was the first
widely used standard package in this field. Also, the calculation of
$\chi^2$ error levels was compared with KinGUII, CAKE and DegKin manager in 
a project sponsored by the German Umweltbundesamt [@ranke2014].

# Laboratory Data L2

The following code defines example dataset L2 from the FOCUS kinetics
report, p. 287:

```{r}
FOCUS_2006_L2 = data.frame(
  t = rep(c(0, 1, 3, 7, 14, 28), each = 2),
  parent = c(96.1, 91.8, 41.4, 38.7,
             19.3, 22.3, 4.6, 4.6,
             2.6, 1.2, 0.3, 0.6))
FOCUS_2006_L2_mkin <- mkin_wide_to_long(FOCUS_2006_L2)
```

## SFO fit for L2

Again, the SFO model is fitted and the result is plotted. The residual plot
can be obtained simply by adding the argument `show_residuals` to the plot
command.

```{r fig.width = 7, fig.height = 6}
m.L2.SFO <- mkinfit("SFO", FOCUS_2006_L2_mkin, quiet=TRUE)
plot(m.L2.SFO, show_residuals = TRUE, show_errmin = TRUE, 
     main = "FOCUS L2 - SFO")
```

The $\chi^2$ error level of 14% suggests that the model does not fit very well.
This is also obvious from the plots of the fit, in which we have included 
the residual plot.

In the FOCUS kinetics report, it is stated that there is no apparent systematic
error observed from the residual plot up to the measured DT90 (approximately at
day 5), and there is an underestimation beyond that point.

We may add that it is difficult to judge the random nature of the residuals just 
from the three samplings at days 0, 1 and 3. Also, it is not clear _a
priori_ why a consistent underestimation after the approximate DT90 should be
irrelevant. However, this can be rationalised by the fact that the FOCUS fate
models generally only implement SFO kinetics.

## FOMC fit for L2

For comparison, the FOMC model is fitted as well, and the $\chi^2$ error level
is checked.

```{r fig.width = 7, fig.height = 6}
m.L2.FOMC <- mkinfit("FOMC", FOCUS_2006_L2_mkin, quiet = TRUE)
plot(m.L2.FOMC, show_residuals = TRUE,
     main = "FOCUS L2 - FOMC")
summary(m.L2.FOMC, data = FALSE)
```

The error level at which the $\chi^2$ test passes is much lower in this case.
Therefore, the FOMC model provides a better description of the data, as less
experimental error has to be assumed in order to explain the data.

## DFOP fit for L2

Fitting the four parameter DFOP model further reduces the $\chi^2$ error level. 

```{r fig.width = 7, fig.height = 6}
m.L2.DFOP <- mkinfit("DFOP", FOCUS_2006_L2_mkin, quiet = TRUE)
plot(m.L2.DFOP, show_residuals = TRUE, show_errmin = TRUE,
     main = "FOCUS L2 - DFOP")
summary(m.L2.DFOP, data = FALSE)
```

Here, the DFOP model is clearly the best-fit model for dataset L2 based on the 
chi^2 error level criterion. However, the failure to calculate the covariance
matrix indicates that the parameter estimates correlate excessively. Therefore,
the FOMC model may be preferred for this dataset.

# Laboratory Data L3

The following code defines example dataset L3 from the FOCUS kinetics report,
p. 290.

```{r}
FOCUS_2006_L3 = data.frame(
  t = c(0, 3, 7, 14, 30, 60, 91, 120),
  parent = c(97.8, 60, 51, 43, 35, 22, 15, 12))
FOCUS_2006_L3_mkin <- mkin_wide_to_long(FOCUS_2006_L3)
```

## Use mmkin to fit multiple models

As of mkin version 0.9-39 (June 2015), we can fit several models to 
one or more datasets in one call to the function `mmkin`. The datasets
have to be passed in a list, in this case a named list holding only
the L3 dataset prepared above.

```{r fig.height = 8}
mm.L3 <- mmkin(c("SFO", "FOMC", "DFOP"), 
               list("FOCUS L3" = FOCUS_2006_L3_mkin), quiet = TRUE)
plot(mm.L3)
```

The $\chi^2$ error level of 21% as well as the plot suggest that the SFO model
does not fit very well.  The FOMC model performs better, with an
error level at which the $\chi^2$ test passes of 7%.  Fitting the four
parameter DFOP model further reduces the $\chi^2$ error level
considerably.

## Accessing elements of mmkin objects

The objects returned by mmkin are arranged like a matrix, with 
models as a row index and datasets as a column index.

We can extract the summary and plot for *e.g.* the DFOP fit,
using square brackets for indexing which will result in the use of
the summary and plot functions working on mkinfit objects.

```{r fig.height = 5}
summary(mm.L3[["DFOP", 1]])
plot(mm.L3[["DFOP", 1]], show_errmin = TRUE)
```

Here, a look to the model plot, the confidence intervals of the parameters 
and the correlation matrix suggest that the parameter estimates are reliable, and
the DFOP model can be used as the best-fit model based on the $\chi^2$ error
level criterion for laboratory data L3.

This is also an example where the standard t-test for the parameter `g_ilr` is
misleading, as it tests for a significant difference from zero. In this case, 
zero appears to be the correct value for this parameter, and the confidence 
interval for the backtransformed parameter `g` is quite narrow.

# Laboratory Data L4

The following code defines example dataset L4 from the FOCUS kinetics
report, p. 293:

```{r}
FOCUS_2006_L4 = data.frame(
  t = c(0, 3, 7, 14, 30, 60, 91, 120),
  parent = c(96.6, 96.3, 94.3, 88.8, 74.9, 59.9, 53.5, 49.0))
FOCUS_2006_L4_mkin <- mkin_wide_to_long(FOCUS_2006_L4)
```

Fits of the SFO and FOMC models, plots and summaries are produced below:

```{r fig.height = 6}
mm.L4 <- mmkin(c("SFO", "FOMC"), 
               list("FOCUS L4" = FOCUS_2006_L4_mkin), 
               quiet = TRUE)
plot(mm.L4)
```

The $\chi^2$ error level of 3.3% as well as the plot suggest that the SFO model
fits very well.  The error level at which the $\chi^2$ test passes is slightly
lower for the FOMC model. However, the difference appears negligible.


```{r fig.height = 8}
summary(mm.L4[["SFO", 1]], data = FALSE)
summary(mm.L4[["FOMC", 1]], data = FALSE)
```


# References