ENCODE is the Encyclopedia of DNA Elements. Encode was launched in 2003 to identify all the functional elements (working bits) in the human genome. The work was done by over 400 scientists in 32 laboratories in the US, UK, Spain, Singapore and Japan. Their findings were published in 30 open-access papers in three journals, Nature, Genome Biology and Genome Research. It is the most detailed analysis to date of the human genome.
A simplified account of their main findings is as follows:
- Only 1% of the genome codes for proteins. That is about 21,000 genes.
- 70,000 sequences code for 'promotor' regions. They are upstream of the genes, where proteins bind to control gene expression.
- There are about 400,000 'enhancer' regions which regulate distant genes.
- There are four million gene 'switches'. These are DNA sequences which control when genes are switched on or off. They are often a long way on the genome from the gene they control.
- About 80% of the genome has a definite biochemical function. The idea that most of the DNA is "junk DNA" is definitely wrong. "The vast majority of the human genome does not code for proteins and, until now, did not seem to contain defined gene-regulatory elements. Why evolution would maintain large amounts of 'useless' DNA had remained a mystery, and seemed wasteful. It turns out, however, that there are good reasons to keep this DNA. Results from the ENCODE project show that most of these stretches of DNA harbour regions that bind proteins and RNA molecules, bringing these into positions from which they cooperate with each other to regulate the function and level of expression of protein-coding genes".[1]
- Evolution is caused both by changes in the genes which code for proteins and in the DNA which codes for regulatory control.
- "One of the great challenges in evolutionary biology is to understand how differences in DNA sequence between species determine differences in their phenotypes. Evolutionary change may occur both through changes in protein-coding sequences and through sequence changes that alter gene regulation".[2]
The methods used for the work included:
- They isolated and sequenced the RNA transcribed from the genome.
- They identified the binding sites for about 120 transcription products.
- They examined patterns of chemical modification made to histones. This was to find regions where gene expression is boosted or suppressed.
- They did 1648 experiments on 147 cell types.
News items connected with this work are:
- An individual patient's cancer has about 50 genetic mutations, but the mutations differ between individuals.
- A biotech company, Life Technologies, has announced a laserprinter-size machine which can sequence a human genome in a day or two instead of weeks. It costs $149,000. Other companies are rushing to put their products on the market. An implication of this is that, within a decade or two, local schools and medical practices might use such machines. They could be used to find genetic markers for a range of inherited defects.
The information on this page comes mainly from two sources:
- Maher, Brendan 2012. ENCODE: The human encyclopaedia. Nature 489 (7414) 46–48.
- Walsh, Fergus 2012. ENCODE: The human encyclopaedia. BBC News Sci & Environment.