Computational Chemistry, Contributed Talk (15min)
CC-013

Enzeptional: enzyme optimization via  a generative language modeling-based evolutionary algorithm

Y. Nana Teukam1,2, M. Manica1, F. Grisoni2, T. Laino1*
1IBM Research Europe, Zürich, 2Eindhoven University of Technology

Enzymes are molecular machines optimized by nature to allow otherwise impossible chemical processes to occur. Besides the increased reaction rates, they present remarkable characteristics to enable more sustainable reactions: mild conditions, less toxic solvents, and reduced waste. Billion years of evolution have made enzymes extremely efficient. However wide adoption in industrial processes requires faster design using in-silico methodologies, a daunting task far from being solved. The majority of methods operate by introducing mutations in an existing amino acid (AA) sequence using a variety of assumptions and strategies to introduce variants in the original sequence. More recently, machine learning and deep generative networks have gained popularity in the field of protein engineering by leveraging prior knowledge on protein binders, their physicochemical properties, or the 3D structure. Here, we cast the problem of enzyme optimization as an evolutionary algorithm where mutations are modeled via generative language modeling. Relying on pretrained language models trained on AA sequences, we apply transfer learning and train a scoring model on a dataset of biocatalysed chemical reactions that is used to drive the optimization process. Our methodology allows designing enzymes with higher biocatalytic activity, emulating the evolutionary process occurring in nature by sampling optimal sequences modeling the underlying proteomic language.