Scientific Retreats 2013

September 4, 2013

Session Topic: Impact of the Functionalized Genome on NIAMS Diseases
– Implications from ENCODE

Introduction

In September 2012, the Encyclopedia of DNA Elements (ENCODE) Project Consortium capped off nine years of research with a simultaneous publication of 30 papers that provided the first comprehensive view of the functional elements across the entire human genome (summary paper – Reference 1). This large-scale concerted effort generated a massive amount of data of various types (Figure 1). Their findings have many significant implications, from redefining what a “gene” is, to providing new clues about diseases, to piecing together how the genome works in three dimensions. One key finding is that a substantial portion of the genome, which was once dismissed as “junk DNA” because it lies outside of the protein-coding genes, is now thought to fulfill key functions, often regulating how and when genes are turned on or off.

The ENCODE data are rapidly becoming an essential resource for researchers to help understand human biology and disease. For example, we now know that many of the single nucleotide polymorphisms (SNPs) identified in genome-wide association studies (GWAS) lie within regions of the genome likely to have regulatory functions, as catalogued by ENCODE. This finding also provides new clues to identifying genetic variants causally linked to disease. Recently, a team at the Broad Institute led by a NIAMS-funded investigator has shown that disease associated SNPs tend to cluster near specific markers on chromatin (the DNA-protein complex that packages DNA in the cell nucleus). One of these chromatin marks, known as H3K4me3, consistently overlapped SNPs in the cell types associated with each of the four diseases or traits they were investigating (Reference 2). This result provides a useful tool for researchers to sift through the overwhelming amount of data from ENCODE and prioritize variants when many SNPs or whole regions of the genome are associated with a disease. Despite some pioneering work, challenges remain for the NIAMS investigators to take full advantage of the ENCODE resources. For example, the high degree of cell-type specificity of regulatory elements revealed by ENCODE emphasizes the importance of having appropriate biological materials on which to test hypotheses. However, ENCODE represents only an initial exploration of the depth of our genome, and many cell types are yet to be investigated. Other challenges lie in generating and analyzing data derived from cells and tissues relevant to the disease under study, and understanding how these functional elements affect genes that may be distantly located.

Goals of the Session

The goal of this session is to gain a better understanding of the research opportunities enabled by ENCODE and discuss strategies to overcome challenges in leveraging the ENCODE resources to advance NIAMS research areas. We will be looking at strategies taken by other NIH IC’s to facilitate effective leveraging.

  • What are the key findings and resources from ENCODE that impact NIAMS research areas?
  • What are the translational implications of the functionalized genome?
  • How can NIAMS investigators fully benefit from these key resources to advance the research portfolio of NIAMS?
  • What are the challenges that prevent NIAMS investigators from taking full advantage of the valuable ENCODE resources? How can we address these challenges?
  • What analytical capabilities are required to utilize ENCODE data?

References

  1. The ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012 Sep 6;489(7414):57-74. doi: 10.1038/nature11247. PMID: 22955616

  2. Trynka G, Sandor C, Han B, Xu H, Stranger BE, Liu XS, Raychaudhuri S. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat Genet. 2013 Feb;45(2):124-30. doi: 10.1038/ng.2504. Epub 2012 Dec 23. PMID: 23263488

  3. Ecker JR, Bickmore WA, Barroso I, Pritchard JK, Gilad Y, Segal E. Genomics: ENCODE explained. Nature. 2012 Sep 6;489(7414):52-5. doi: 10.1038/489052a. PMID: 22955614

 The ENCODE project provides information on the human genome far beyond that contained within the DNA sequence—it describes the functional genomic elements that orchestrate the development and function of a human. The project contains data about the degree of DNA methylation and chemical modifications to histones that can influence the rate of transcription of DNA into RNA molecules (histones are the proteins around which DNA is wound to form chromatin). ENCODE also examines long-range chromatin interactions, such as looping, that alter the relative proximities of different chromosomal regions in three dimensions and also affect transcription. Furthermore, the project describes the binding activity of transcription-factor proteins and the architecture (location and sequence) of gene-regulatory DNA elements, which include the promoter region upstream of the point at which transcription of an RNA molecule begins, and more distant (long-range) regulatory elements. Another section of the project was devoted to testing the accessibility of the genome to the DNA-cleavage protein DNase I. These accessible regions, called DNase I hypersensitive sites, are thought to indicate specific sequences at which the binding of transcription factors and transcription-machinery proteins has caused nucleosome displacement. In addition, ENCODE catalogues the sequences and quantities of RNA transcripts, from both non-coding and protein-coding regions.

Click image for larger view.

Figure 1. Beyond the sequence (Nature 489, 52-55, 06 September 2012 – Reference 3). The ENCODE project provides information on the human genome far beyond that contained within the DNA sequence—it describes the functional genomic elements that orchestrate the development and function of a human. The project contains data about the degree of DNA methylation and chemical modifications to histones that can influence the rate of transcription of DNA into RNA molecules (histones are the proteins around which DNA is wound to form chromatin). ENCODE also examines long-range chromatin interactions, such as looping, that alter the relative proximities of different chromosomal regions in three dimensions and also affect transcription. Furthermore, the project describes the binding activity of transcription-factor proteins and the architecture (location and sequence) of gene-regulatory DNA elements, which include the promoter region upstream of the point at which transcription of an RNA molecule begins, and more distant (long-range) regulatory elements. Another section of the project was devoted to testing the accessibility of the genome to the DNA-cleavage protein DNase I. These accessible regions, called DNase I hypersensitive sites, are thought to indicate specific sequences at which the binding of transcription factors and transcription-machinery proteins has caused nucleosome displacement. In addition, ENCODE catalogues the sequences and quantities of RNA transcripts, from both non-coding and protein-coding regions. .