summaryrefslogtreecommitdiff
path: root/docs/llms.txt
blob: 115ba769c015a3960a0e93655ee5b26e1d6a392f (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
# chents

[![Online
documentation](https://img.shields.io/badge/docs-jrwb.de-blue.svg)](https://pkgdown.jrwb.de/chents/)
[![R-Universe
status](https://jranke.r-universe.dev/badges/chents)](https://jranke.r-universe.dev/chents)
[![Code
coverage](https://img.shields.io/badge/coverage-jrwb.de-blue.svg)](https://pkgdown.jrwb.de/chents/coverage/coverage.html)

When working with data on chemical substances, we often need a reliable
link between the data and the chemical identity of the substances. The R
package **chents** provides a way to define and check the identity of
chemically defined substances (“chemical entities”) and to collect
related information.

When first defining a chemical entity, some chemical information is
retrieved from the [PubChem](https://pubchem.ncbi.nlm.nih.gov/) website
using the [webchem](https://docs.ropensci.org/webchem/) package.

``` r
library(chents)
caffeine <- chent$new("Caffeine")
#> Querying PubChem for name Caffeine ...
#> Get chemical information from RDKit using PubChem SMILES
#> CN1C=NC2=C1C(=O)N(C(=O)N2C)C
```

If Python and [RDKit](https://rdkit.org) (\> 2015.03) are installed and
configured for use with the
[reticulate](https://rstudio.github.io/reticulate/) package, some
additional chemical information including a 2D graph are computed.

The print method gives an overview of the information that was
collected.

``` r
print(caffeine)
#> <chent>
#> Identifier $identifier Caffeine 
#> InChI Key $inchikey RYYVLZVUVIJVGH-UHFFFAOYSA-N 
#> SMILES string $smiles:
#>                        PubChem 
#> "CN1C=NC2=C1C(=O)N(C(=O)N2C)C" 
#> Molecular weight $mw: 194.2 
#> PubChem synonyms (up to 10):
#>  [1] "caffeine"                "58-08-2"                
#>  [3] "Guaranine"               "1,3,7-Trimethylxanthine"
#>  [5] "Methyltheobromine"       "Theine"                 
#>  [7] "Thein"                   "Cafeina"                
#>  [9] "Caffein"                 "Cafipel"
```

There is a very simple plotting method for the chemical structure.

``` r
plot(caffeine)
```

![](reference/figures/README-unnamed-chunk-4-1.png)

If you have a so-called ISO common name of a pesticide active
ingredient, you can use the ‘pai’ class derived from the ‘chent’ class,
which starts with querying the [BCPC
compendium](http://www.bcpcpesticidecompendium.org/) first.

``` r
delta <- pai$new("Deltamethrin")
#> Querying BCPC for Deltamethrin ...
#> Querying PubChem for inchikey OWZREIFADZCYQD-NSHGMRRFSA-N ...
#> Get chemical information from RDKit using PubChem SMILES
#> CC1([C@H]([C@H]1C(=O)O[C@H](C#N)C2=CC(=CC=C2)OC3=CC=CC=C3)C=C(Br)Br)C
plot(delta)
```

![](reference/figures/README-unnamed-chunk-5-1.png)

Additional information can be read from a local .yaml file. This
information can be leveraged e.g. by the
[PEC_soil](https://pkgdown.jrwb.de/pfm/reference/PEC_soil.html) function
of the ‘pfm’ package. However, this functionality is to be superseded by
a dedicated package, defining data for the environmental risk assessment
on chemicals, in particular on active ingredients of plant protection
products.

## Installation

You can conveniently install chents from the repository kindly made
available by the R-Universe project:

``` r
install.packages("chents",
  repos = c("https://jranke.r-universe.dev", "https://cran.r-project.org"))
```

In order to profit from the chemoinformatics, you need to install RDKit
and its python bindings. On a Debian type Linux distribution, just use

``` sh
sudo apt install python3-rdkit
```

If you use this package on Windows or MacOS, I would be happy to include
installation instructions here if you share them with me, e.g. via a
Pull Request.

## Configuration of the Python version to use

On Debian type Linux distributions, you can use the following line in
your global or project specific `.Rprofile` file to tell the
`reticulate` package to use the system Python version that will find the
RDKit installed in the system location.

``` r
Sys.setenv(RETICULATE_PYTHON="/usr/bin/python3")
```

## Using R6 classes

Note that the `chent` objects defined by this package are
[R6](https://r6.r-lib.org/articles/Introduction.html) classes.
Therefore, if you think you make a copy by assigning them to a new name,
the objects will still be connected, because only the reference is
copied. For example, you can create a molecule without retrieving data
from PubChem

``` r
but <- chent$new("Butane", smiles = "CCCC", pubchem = FALSE)
#> Get chemical information from RDKit using user SMILES
#> CCCC
print(but)
#> <chent>
#> Identifier $identifier Butane 
#> InChI Key $inchikey NA 
#> SMILES string $smiles:
#>   user 
#> "CCCC" 
#> Molecular weight $mw: 58.1
```

If you then assign a new name and add PubChem information to the object
with the new name, the information will also be added to the original
`chent` object:

``` r
but_pubchem <- but
but_pubchem$try_pubchem()
#> Querying PubChem for name Butane ...
print(but)
#> <chent>
#> Identifier $identifier Butane 
#> InChI Key $inchikey IJDNQMDRQITEOD-UHFFFAOYSA-N 
#> SMILES string $smiles:
#>    user PubChem 
#>  "CCCC"  "CCCC" 
#> Molecular weight $mw: 58.1 
#> PubChem synonyms (up to 10):
#>  [1] "BUTANE"              "n-Butane"            "106-97-8"           
#>  [4] "Diethyl"             "Methylethylmethane"  "Butanen"            
#>  [7] "Butani"              "Butyl hydride"       "HC 600"             
#> [10] "A 21 (lowing agent)"
```

You can create a derived, independent object using the `clone()` method
that will not be affected by operations on the original object:

``` r
but_new <- chent$new("Butane", smiles = "CCCC", pubchem = FALSE)
#> Get chemical information from RDKit using user SMILES
#> CCCC
but_clone <- but_new$clone()
but_new$try_pubchem()
#> Querying PubChem for name Butane ...
but_clone
#> <chent>
#> Identifier $identifier Butane 
#> InChI Key $inchikey NA 
#> SMILES string $smiles:
#>   user 
#> "CCCC" 
#> Molecular weight $mw: 58.1
```

# Package index

## R6 Class definitions and methods

- [`chent`](https://pkgdown.jrwb.de/chents/reference/chent.md) : An R6
  class for chemical entities with associated data
- [`pai`](https://pkgdown.jrwb.de/chents/reference/pai.md) : An R6 class
  for pesticidal active ingredients and associated data
- [`ppp`](https://pkgdown.jrwb.de/chents/reference/ppp.md) : R6 class
  for a plant protection product with at least one active ingredient
- [`draw_svg.chent()`](https://pkgdown.jrwb.de/chents/reference/draw_svg.chent.md)
  : Draw SVG graph from a chent object using RDKit
- [`plot(`*`<chent>`*`)`](https://pkgdown.jrwb.de/chents/reference/plot.chent.md)
  : Plot method for chent objects
- [`print(`*`<chent>`*`)`](https://pkgdown.jrwb.de/chents/reference/print.chent.md)
  : Printing method for chent objects
- [`print(`*`<pai>`*`)`](https://pkgdown.jrwb.de/chents/reference/print.pai.md)
  : Printing method for pai objects (pesticidal active ingredients)
- [`print(`*`<ppp>`*`)`](https://pkgdown.jrwb.de/chents/reference/print.ppp.md)
  : Printing method for ppp objects (plant protection products)

Contact - Imprint