Department of Biostatistics & Bioinformatics,Duke University
“USING HIGH-THROUGHPUT DATA TO DERIVE NEW MODELS OF PROTEIN-DNA BINDING SPECIFICITY”
Transcription factors (TFs) regulate gene expression by binding to specific, short DNA sites in the promoters or enhancers of the genes they regulate. Most eukaryotic TFs are members of large protein families that share a common DNA binding domain and thus have similar binding specificities. However, paralogous TFs (i.e., members of the same family) typically perform different regulatory functions in vivo by binding to different sets of genomic sites. Current models for DNA binding specificity, such as position weight matrices (PWMs), cannot typically distinguish among the DNA sites bound by paralogous TFs. This significantly restricts our ability to computationally predict genomic targets for individual members of TF families. I will describe new regression-based models for TF-DNA binding specificity, that are able to capture differences among closely-related TFs even when their PWMs are virtually identical. We train our models using high-throughput in vitro data from custom protein binding microarray (PBM) experiments. Our PBMs are specifically designed to cover a large number of potential TF-DNA binding sites in their native genomic context. Thus, our models are able to accurately predict how genomic regions flanking putative TF binding sites might affect DNA binding specificity. Our PBM-derived regression models represent an important step toward a better understanding of DNA binding specificity within TF families.
Friday, February 8, 2013 @ 1:30 PM in Bioinformatics 105