The overarching goal of our project is to develop a unified, comprehensive bacterial natural product gene cluster discovery and activation pipeline that will be capable of inducing molecule production from gene clusters that are recalcitrant to expression in the laboratory. This project is divided into 3 specific aims that will yield: (1) optimized heterologous expression strains, (2) new bioinformatics tools and methods for identification and cloning of novel natural product gene cluster families, (3) a semi-automated pipeline for high-throughput discovery and expression of natural products from bacterial metagenomes.


Aim 1. Select and optimize strains for improved DNA-based high-throughput natural product discovery pipelines.

In the past, host strains for heterologous natural product expression have been optimized for high levels of expression of specific metabolites. As a result, current model hosts are inefficient because of their limited transcriptional promiscuity and limited substrate repertoire.

We will develop new model hosts by first empirically identifying strains with the highest natural propensity to induce secondary metabolite expression and then further optimizing these strains for DNA-based natural product discovery using synthetic methods (SMOG, small molecule expression optimized genomes).


Aim 2. Develop methods for the bioinformatic identification and cloning of previously unknown natural product gene cluster families.

Limitations to efficiently feeding novel microbial biosynthetic diversity into improved DNA-based natural product discovery pipelines include difficulties associated with: (1) computational identification of truly novel cryptic gene cluster families (CCFs), (2) inability to rapidly annotate biosynthetic diversity to identify gene clusters of interest with minimal sequencing effort, and (3) accessing of complete biosynthetic gene clusters.

We will address these bottlenecks by: (1) developing a bioinformatics pipeline to identify CCFs, where a CCF is defined as a cluster that, by gene content and gene organization, does not resemble any known cluster family, (2) developing a bioinformatics standard for identifying novel natural product biosynthetic gene clusters to enable high throughput identification of CCFs (or any other gene cluster family) and an accompanying de novo metagenomic-sequencing pipeline for arrayed DNA libraries, (3) Develop advanced methods for parallel recovery of gene clusters from metagenomic libraries.


Aim 3. Develop a pipeline for high throughput discovery and rapid structural novelty evaluation of natural products through increasing degrees of gene cluster manipulation

We will leverage the tools established in Aims 1 & 2 to design a series of complementary approaches for accessing natural products from DNA sequence. The overall strategy is driven by the desire to develop gene cluster activation methods that retain native operon structure and use in vivo generated reagents as much as possible, based on the belief that the least intrusive methods will lead to the highest throughput discovery approaches.

To complement the cluster discovery and activation pipeline, we will develop methods for rapid, small-scale, semi-automated structural novelty evaluation of the heterologously produced natural products directly from crude extracts.