Chemistry and the Environment, Contributed Talk (15min)
EV-025

enviRule: An End-to-end System for Automatic Extraction of Reaction Patterns from Environmental Contaminant Biotransformation Pathways

K. Zhang1,2, K. Fenner1,2*
1Department of Environmental Chemistry, Swiss Federal Institute of Aquatic Science and Technology (Eawag) , 2Department of Chemistry, University of Zürich

Pesticides are widely used all over the world in large quantities and are now taken as major environmental pollutants. Pesticide transformation products (TPs), formed in the environment, can have similar or even more serious adverse environmental effects than parent pesticides. However, the experimental characterization of TPs is time-consuming and labor-intense. Several tools for the in silico prediction of TPs have therefore been developed over the past few decades (e.g., envipath.org). One key challenge in developing in silico prediction tools is the extraction of reaction rules from biotransformation reaction databases. So far, in the majority of existing tools, rule extraction has been done non- or half-automated, requiring manual interference at several points in the rule extraction procedure. Typically, the degree of specificity of the thus extracted rules is arbitrary and non-validated, meaning that resulting rules are likely to produce too many false positives or too few true positives. Additionally, databases on biotransformation reactions are constantly growing, but adapting existing rules to cover new reactions becomes quickly intractable for manually extracted and curated rules. In our project, we developed an end-to-end automatic rule generation tool called enviRule, which does not require any manual interference for rule generation or adaption. enviRule consists of three main functional modules, namely, reaction clusterer, reaction adder, and rule generator. It is capable of clustering biotransformation reactions into different groups based on the similarities of reaction fingerprints, and of then extracting and generalizing reaction patterns for each reaction group as SMIRKS. The specificity of each rule can be automatically adjusted through a feedback loop until the rule achieves a decent ratio of true positive over false positives. When new reactions are added, they are distributed into existing reaction groups with similar reaction fingerprints, and the corresponding rules of these groups are then automatically self-updated. Using the enviRule tool drastically decreases the time required for rule extraction compared with manual design. Additionally, high recall and precision were achieved when using the automatically extracted rules in combination with machine learning models. Most importantly, to the best of our knowledge, enviRule is the first tool that implements automatic rule adaption to deal with the growing number of reported biotransformation reactions.