path: root/vignettes/web_only/multistart.rmd



---
title: Short demo of the multistart method
author: Johannes Ranke
date: Last change 20 April 2023 (rebuilt `r Sys.Date()`)
output:
  html_document
vignette: >
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteIndexEntry{Short demo of the multistart method}
  %\VignetteEncoding{UTF-8}
---

The dimethenamid data from 2018 from seven soils is used as example data in this vignette.

```{r}
library(mkin)
dmta_ds <- lapply(1:7, function(i) {
  ds_i <- dimethenamid_2018$ds[[i]]$data
  ds_i[ds_i$name == "DMTAP", "name"] <-  "DMTA"
  ds_i$time <- ds_i$time * dimethenamid_2018$f_time_norm[i]
  ds_i
})
names(dmta_ds) <- sapply(dimethenamid_2018$ds, function(ds) ds$title)
dmta_ds[["Elliot"]] <- rbind(dmta_ds[["Elliot 1"]], dmta_ds[["Elliot 2"]])
dmta_ds[["Elliot 1"]] <- dmta_ds[["Elliot 2"]] <- NULL
```

First, we check the DFOP model with the two-component error model and
random effects for all degradation parameters.

```{r}
f_mmkin <- mmkin("DFOP", dmta_ds, error_model = "tc", cores = 7, quiet = TRUE)
f_saem_full <- saem(f_mmkin)
illparms(f_saem_full)
```
We see that not all variability parameters are identifiable. The `illparms`
function tells us that the confidence interval for the standard deviation
of 'log_k2' includes zero. We check this assessment using multiple runs
with different starting values.

```{r, warnings = FALSE}
f_saem_full_multi <- multistart(f_saem_full, n = 16, cores = 16)
parplot(f_saem_full_multi, lpos = "topleft")
```

This confirms that the variance of k2 is the most problematic parameter, so we
reduce the parameter distribution model by removing the intersoil variability
for k2.

```{r}
f_saem_reduced <- stats::update(f_saem_full, no_random_effect = "log_k2")
illparms(f_saem_reduced)
f_saem_reduced_multi <- multistart(f_saem_reduced, n = 16, cores = 16)
parplot(f_saem_reduced_multi, lpos = "topright", ylim = c(0.5, 2))
```

The results confirm that all remaining parameters can be determined with sufficient
certainty.

We can also analyse the log-likelihoods obtained in the multiple runs:

```{r}
llhist(f_saem_reduced_multi)
```

We can use the `anova` method to compare the models.

```{r}
anova(f_saem_full, best(f_saem_full_multi),
  f_saem_reduced, best(f_saem_reduced_multi), test = TRUE)
```

The reduced model results in lower AIC and BIC values, so it
is clearly preferable. Using multiple starting values gives
a large improvement in case of the full model, because it is
less well-defined, which impedes convergence. For the reduced
model, using multiple starting values only results in a small
improvement of the model fit.