diff options
Diffstat (limited to 'docs/index.md')
| -rw-r--r-- | docs/index.md | 181 |
1 files changed, 181 insertions, 0 deletions
diff --git a/docs/index.md b/docs/index.md new file mode 100644 index 0000000..6eb0823 --- /dev/null +++ b/docs/index.md @@ -0,0 +1,181 @@ +# chents + +[](https://pkgdown.jrwb.de/chents/) +[](https://jranke.r-universe.dev/chents) +[](https://pkgdown.jrwb.de/chents/coverage/coverage.html) + +When working with data on chemical substances, we often need a reliable +link between the data and the chemical identity of the substances. The R +package **chents** provides a way to define an R object corresponding to +a chemically defined substances (“chemical entity”) and to collect +related information. + +When first defining a chemical entity, some chemical information is +retrieved from the [PubChem](https://pubchem.ncbi.nlm.nih.gov/) website +using the [webchem](https://docs.ropensci.org/webchem/) package. + +``` r +library(chents) +caffeine <- chent$new("Caffeine") +#> Querying PubChem for name Caffeine ... +#> Get chemical information from RDKit using PubChem SMILES +#> CN1C=NC2=C1C(=O)N(C(=O)N2C)C +``` + +If Python and [RDKit](https://rdkit.org) (\> 2015.03) are installed and +configured for use with the +[reticulate](https://rstudio.github.io/reticulate/) package, some +additional chemical information including a 2D graph are computed. + +The print method gives an overview of the information that was +collected. + +``` r +print(caffeine) +#> <chent> +#> Identifier $identifier Caffeine +#> InChI Key $inchikey RYYVLZVUVIJVGH-UHFFFAOYSA-N +#> SMILES string $smiles: +#> PubChem +#> "CN1C=NC2=C1C(=O)N(C(=O)N2C)C" +#> Molecular weight $mw: 194.2 +#> PubChem synonyms (up to 10): +#> [1] "caffeine" "58-08-2" +#> [3] "Guaranine" "1,3,7-Trimethylxanthine" +#> [5] "Methyltheobromine" "Theine" +#> [7] "Thein" "Cafeina" +#> [9] "Caffein" "Cafipel" +``` + +There is a very simple plotting method for the chemical structure. + +``` r +plot(caffeine) +``` + + + +If you have a so-called ISO common name of a pesticide active +ingredient, you can use the ‘pai’ class derived from the ‘chent’ class, +which starts with querying the [BCPC +compendium](http://www.bcpcpesticidecompendium.org/) first. + +``` r +delta <- pai$new("Deltamethrin") +#> Querying BCPC for Deltamethrin ... +#> Querying PubChem for inchikey OWZREIFADZCYQD-NSHGMRRFSA-N ... +#> Get chemical information from RDKit using PubChem SMILES +#> CC1([C@H]([C@H]1C(=O)O[C@H](C#N)C2=CC(=CC=C2)OC3=CC=CC=C3)C=C(Br)Br)C +plot(delta) +``` + + + +Additional information can be read from a local .yaml file. This +information can be leveraged e.g. by the +[PEC_soil](https://pkgdown.jrwb.de/pfm/reference/PEC_soil.html) function +of the ‘pfm’ package. However, this functionality is to be superseded by +a dedicated package, defining data for the environmental risk assessment +on chemicals, in particular on active ingredients of plant protection +products. + +## Installation + +You can conveniently install chents from the repository kindly made +available by the R-Universe project: + +``` r +install.packages("chents", + repos = c("https://jranke.r-universe.dev", "https://cran.r-project.org")) +``` + +In order to profit from the chemoinformatics, you need to install RDKit +and its python bindings. On a Debian type Linux distribution, just use + +``` sh +sudo apt install python3-rdkit +``` + +If you use this package on Windows or MacOS, I would be happy to include +installation instructions here if you share them with me, e.g. via a +Pull Request. + +## Configuration of the Python version to use + +On Debian type Linux distributions, you can use the following line in +your global or project specific `.Rprofile` file to tell the +`reticulate` package to use the system Python version that will find the +RDKit installed in the system location. + +``` r +Sys.setenv(RETICULATE_PYTHON="/usr/bin/python3") +``` + +## Using R6 classes + +Note that the `chent` objects defined by this package are +[R6](https://r6.r-lib.org/articles/Introduction.html) classes. +Therefore, if you think you make a copy by assigning them to a new name, +the objects will still be connected, because only the reference is +copied. For example, you can create a molecule without retrieving data +from PubChem + +``` r +but <- chent$new("Butane", smiles = "CCCC", pubchem = FALSE) +#> Get chemical information from RDKit using user SMILES +#> CCCC +print(but) +#> <chent> +#> Identifier $identifier Butane +#> InChI Key $inchikey NA +#> SMILES string $smiles: +#> user +#> "CCCC" +#> Molecular weight $mw: 58.1 +``` + +If you then assign a new name and add PubChem information to the object +with the new name, the information will also be added to the original +`chent` object: + +``` r +but_pubchem <- but +but_pubchem$try_pubchem() +#> Querying PubChem for name Butane ... +print(but) +#> <chent> +#> Identifier $identifier Butane +#> InChI Key $inchikey IJDNQMDRQITEOD-UHFFFAOYSA-N +#> SMILES string $smiles: +#> user PubChem +#> "CCCC" "CCCC" +#> Molecular weight $mw: 58.1 +#> PubChem synonyms (up to 10): +#> [1] "BUTANE" "n-Butane" "106-97-8" +#> [4] "Diethyl" "Methylethylmethane" "Butanen" +#> [7] "Butani" "Butyl hydride" "HC 600" +#> [10] "A 21 (lowing agent)" +``` + +You can create a derived, independent object using the `clone()` method +that will not be affected by operations on the original object: + +``` r +but_new <- chent$new("Butane", smiles = "CCCC", pubchem = FALSE) +#> Get chemical information from RDKit using user SMILES +#> CCCC +but_clone <- but_new$clone() +but_new$try_pubchem() +#> Querying PubChem for name Butane ... +but_clone +#> <chent> +#> Identifier $identifier Butane +#> InChI Key $inchikey NA +#> SMILES string $smiles: +#> user +#> "CCCC" +#> Molecular weight $mw: 58.1 +``` |
