Library Design for Protein Engineering 2023

Invited Talks - Abstracts

**Structures of de novo proteins from a binary-patterned library and the applications to constructing protein nanobuilding blocks**

Ryoichi Arai (Shinshu University)

Combinatorial libraries of de novo sequences can provide rich sources of diversity for the discovery of new proteins with chemically and biologically important functions. We have used binary patterning of polar and nonpolar amino acids to produce libraries of sequences that fold into stable α-helical structures (Hecht M.H. et al., Protein Sci., 2004). To isolate functional proteins from these libraries, collections of synthetic genes encoding the novel sequences can be transformed into auxotrophic strains of E. coli missing a conditionally essential gene required for growth on minimal medium (Fisher M.A. et al., PloS One, 2011). A de novo protein, Syn-F4, was isolated from such a library (Smith B.A. et al., Protein Sci., 2015) and catalyzed the hydrolysis of iron-enterobactin, a reaction essential for the survival of Δfes E. coli in iron-limited environments (Donnelly A.E. et al., Nat. Chem. Biol., 2018). Recently we solved the 3-dimensional structure of Syn-F4, and showed that its structure and biochemical mechanism differ substantially from those of natural enterobactin esterases, thereby demonstrating that proteins designed de novo can provide life-sustaining catalytic activities using structures and mechanisms dramatically different from those that arose in nature (Kurihara K. et al., Proc. Natl. Acad. Sci. U.S.A., 2023).

In addition, several years ago, we reported a dimeric 4-helix bundle structure of WA20 (Arai R. et al., J. Phys. Chem. B, 2012), a de novo protein was isolated from the same binary patterning library. To harness the unusual intertwined structure of WA20 for the self-assembly of supramolecular nanostructures, we designed and constructed a protein nanobuilding block (PN-Block), called WA20-foldon, by fusing the dimeric structure of WA20 to the trimeric foldon domain of fibritin from bacteriophage T4 (Kobayashi N. et al., J. Am. Chem. Soc., 2015). The WA20-foldon self-assembled into several oligomeric nanoarchitectures in multiples of 6-mer. Furthermore, we developed novel functional protein nanobuilding blocks (lectin nano-blocks) by fusing WA20 to a dimeric lectin, Agrocybe cylindracea galectin (ACG) (Irumagawa, S. et al., Int. J. Mol. Sci., 2022). The lectin nano-blocks self-assembled into various oligomers in multiples of 2-mer. The binding avidity of the lectin nano-block oligomers to glycans was significantly increased through multivalent effects. Lectin nano-blocks with high avidity will be useful for various applications, such as specific cell labeling.

Reconstructed evolutionary events: a smorgasbord for protein engineers

Mikael Boden (University of Queensland)

What in the sequence determines—or can be changed to alter—structural or functional qualities of my protein? Clearly sensible questions to a protein engineer, but cumbersome to address by experimentation and databases alone.

The fact that evolution provides useful instruction surprises no biologist; indeed, gleaning the space of ancestral sequences for a protein family (to which my protein belongs) can identify sequence templates robust to mutation and establish bounds of acceptable variability. Moreover, sampling (predicting, synthesizing and characterizing) ancestors at selected branch points can help understand the fitness landscape of the whole family, i.e. the “lay-of-the-land”. This in turn underpins efforts to optimize new variants relative to the potential of the family, not only seek incremental improvements relative to a base case. My group are developing methods to support these pursuits, which I will be highlighting with example applications, including on medically relevant metallo-beta lactamases and industrially potent enzymes in the branched-chain amino acid pathway acting as part of a cell-free cascade for isobutanol production.

In particular, we are developing and evaluating features that (a) help model the evolution of insertions and deletions (indels), exploring the space of indels and their impact on ancestors, (b) annotate branchpoints with properties, transferred across a tree via “modes” that follow evolutionary modelling principles, and (c) support the selection of what ancestors to reconstruct, which in some cases form templates for further mutagenesis, in turn suited to emerging protocols such as every-variant sequencing.

Looking back to the future of protein engineering: protein libraries based on inferred ancestors.

Elizabeth Gillam (University of Queensland)

In creating libraries for directed evolution of proteins, it is useful to look to natural evolution for inspiration, in terms of both the templates and methods for evolution. Using cytochrome P450 monooxygenases as a model system, we have shown that ancestral sequence reconstruction can be used to infer ancestors that are markedly more thermostable and solvent tolerant than their extant counterparts while retaining similar catalytic activities and substrate specificity. These properties make them better able than extant enzymes to withstand the harsh process conditions required for implementation as biocatalysts in chemical syntheses, e.g., elevated temperatures, higher solvent concentrations and long incubations. Importantly, this robustness also makes them more tolerant of mutations than extant proteins and thereby better starting points for further evolution by either point mutagenesis or recombinatorial evolution. In this presentation I will explore how ancestral sequence reconstruction and other computational approaches can be combined with point mutagenesis and DNA family shuffling to create small libraries enriched in highly functional and diverse proteins.

Diversity of fitness peak shapes in the Green Fluorescent Protein

Fyodor Kondrashov (OIST)

How genotypes are manifested into phenotypes, known as fitness landscapes, remains one of the most important unanswered questions in biology. However, deep mutational scans - empirical assays of the fitness landscapes - are increasingly being used to shape our understanding of the complex nature of interaction of changes on the genetic level. In my talk I will outline what we have learned from our study of the empirical assays of different orthologues of the Green Fluorescent Protein. We find that the fitness peak associated with each orthologue is unique, demonstrating different overall shape but also showing interactions between different amino acid sites. We have also been able to utilize these data and make generalized models of the fitness landscapes that are fairly accurate at genetic distances vastly beyond what has been assayed experimentally. The uniqueness of the shape of each fitness peak, however, leads to an inherent limitation of predicting the shape of one fitness peak based on the information contained in another fitness peak. To create better and more general models of fitness landscapes different empirical and theoretical approaches will be needed.

Multiscale Modeling of Enzyme Activity

Dan T. Major (Bar-llan University)

Terpene synthases (TPS) catalyze some of the most complex chemical reactions in nature. The synthetic power of these enzymes has been studied using a multitude of tools, including biochemistry, structural biology, and theory. Theoretical tools employed include gas-phase quantum chemistry methods and so-called multi-scale in-enzyme methods. Based on decades of experimental and theoretical studies of TPS, several key catalytic features of these enzyme reactions have been identified: Intrinsic reactivity of carbocations, active site contour and substrate folding, pyrophosphate-Mg²⁺-enzyme cluster activation of the C–O bond, bifacial nature of the active site, and active site acid/base functionalities.

In this talk I will show some of the multiscale modelling performed in my group in recent years. I will provide an overview over the methods typically used in such work, like our enzyme docking approach EnzyDock, the systems studied, and main catalytic features borne out of these studies. Thereafter I will present two test cases we have studied recently in-depth, namely the diterpene synthases CotB2 and hydropyrene synthase. I will discuss how a deep understanding of TPS catalysis has allowed us to rationally design new variants in silico with improved thermal stability and new catalytic function.

Accelerating discovery of substrate promiscuity in biocatalyzed oxidations

Joelle N. Pelletier (University of Montreal)

Enzymes collectively display a great breadth of catalytic properties yet are individually confined to one or a few specific catalytic tasks. Despite key advances in enzyme engineering, our capacity to predict the effects of mutations on function remains nebulous. Here we present advances in engineering non-native substrate recognition for biocatalyzed transformation into useful products. We examine cytochrome P450 oxidase from Bacillus megaterium (P450 BM3) in its capacity to functionalize C-H bonds. Cost-effective, high-throughput colorimetric screening at the whole-cell level had previously suggested a correlation between the production of indigo and increased substrate promiscuity, in a small number of P450 BM3 variants. We greatly expand the diversity of indigo-producing P450 BM3 variants and demonstrate a correlation with promiscuous aromatic hydroxylation reactions. We look ahead to the potential for large experimental datasets to train smarter design algorithms for enzyme engineering.

Ancestral proteins as scaffolds for enzyme engineering and evolution

Jose M. Sanchez-Ruiz (University of Granada)

Plausible approximations to sequences of ancestral proteins (i.e., proteins from extict organisms) can be derived from the known sequences of their modern descendants using suitable analyses. The possibility of ancestral sequence reconstruction was proposed by Linus Pauling and Emile Zuckerkandl in 1963. While this was only a theoretical possibility in the mid-twentieth century, ancestral sequence reconstruction has recently become a standard approach in molecular evolution studies, as a result of advances in phylogenetics and bioinformatics, together with the availability of very large sequence databases. In the last ～25 years, proteins encoded by reconstructed ancestral sequences (i.e., “resurrected” ancestral proteins, in the common jargon of the field) have proven to be very useful tools to address important problems in molecular evolution. Furthermore, we and others have shown that resurrected ancestral proteins may display properties that are desirable in scaffolds for enzyme engineering. These properties include high stability, as well as substrate and catalytic promiscuity, and will be described in some detail in relation with the generation of de novo enzymes, a major unsolved problem in protein engineering. Finally, I will discuss the unusual properties and biotechnological potential of ancestral TIM-barrels and the use of resurrected ancestral proteins to study viral evolution.

Library creation for protein engineering

Tuck Seng Wong^1,2,3

¹Department of Chemical & Biological Engineering, the University of Sheffield, Sir Robert Hadfield Building, Mappin Street, Sheffield S1 3JD, United Kingdom; ²National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science & Technology Development Agency (NSTDA), 113 Thailand Science Park, Phahonyothin Road, Khlong Nueng, Khlong Luang, Pathum Thani 12120, Thailand; ³School of Pharmacy, Bandung Institute of Technology, Bandung, West Java, Indonesia.

Library creation stands as a cornerstone in the realm of protein engineering, an indispensable stride within its workflow. Indeed, the efficacy of protein library construction profoundly shapes the prospects of success and the extent of resources entailed in achieving a protein variant endowed with the desired attributes. Within the scope of this presentation, my emphasis rests on two pivotal facets of protein library generation: (a) the methodologies employed for crafting such libraries, and (b) the evaluation of their quality. With the advent of error-prone polymerase chain reaction (epPCR), a plethora of library construction techniques has surfaced, exhibiting varying degrees of technical difficulty. My intention is to unveil some of the most cutting-edge methodologies, drawing apt comparisons against traditional approaches. Consequently, I will expound upon the evaluative criteria pertinent to assessing the quality of a protein library, concurrently appraising these recently formulated techniques in that light.

Engineering functional RNAs in cells and test tubes

Yohei Yokobayashi (OIST)

RNAs exhibit diverse biological functions in living cells ranging from transfer of genetic information (mRNAs), protein synthesis (ribosomes), and regulation of gene expression (microRNAs etc.). Engineering of RNAs that function in living cells and in test tubes shares many experimental approaches and challenges with protein engineering. In this talk, I will briefly discuss different experimental strategies behind our efforts to engineer functional RNAs designed to operate in different contexts (e.g., mammalian cells, bacteria, artificial cells). Topics include riboswitches that regulate gene expression in response to small molecules, experimental analysis ribozyme fitness landscapes in vitro, and co-selection of orthogonal RNA-protein complexes in vitro.