Loading AI tools
来自维基百科,自由的百科全书
假基因(Pseudogenes,Pseudo-意爲「假」)是一類染色體上的基因片段。假基因的序列通常與對應的基因相似,但至少是喪失了一部分功能,如基因不能表達或編碼的蛋白質沒有功能[3]。
一般認爲,假基因最初是功能對生物生存並非必要的基因。隨着突變的積累,出現編碼區提前出現終止密碼子、移碼突變等情況,逐漸變爲無功能的假基因。另外,拷貝數變異(Copy-number variation, CNV)也可能產生假基因。在拷貝數變異中,1kb(千鹼基對)以上的DNA片段會發生複製或刪除[4]。一部分假基因既沒有內含子,也沒有啓動子(這種啓動子被認爲是通過mRNA的逆轉錄轉移到染色體上的,稱爲「加工」假基因(processed pseudogenes))[5],但部分假基因仍然擁有一些與正常基因相同的特徵,比如擁有CpG島等啓動子、RNA剪接位點等。
假基因這一名詞是由雅克(Jacq)等人於1977年最早提出的[6]。長期以來生物學家們認爲假基因是沒有功能的垃圾DNA,惟近年來的研究還表明假基因和其他非編碼片段一樣,擁有調控基因表達的功能。假基因的調控作用對維持生物體的生理活動有着重要意義,一部分假基因在某些疾病的發展中也扮演着重要角色[7]。
在進化生物學研究中,這些因為演化而喪失功能的假基因,對他們進行序列分析意義則相對重大,一直是研究者獲知生物進化歷程的手段。假基因一般會擁有一些源基因的特徵。按照進化論的觀點,兩個親緣關係較近的物種擁有同一祖先。對假基因進行序列比對、分析,即可驗證兩物種是否擁有同一祖先,並能計算出兩物種開始分離的時間(結果能精確到百萬年)。
假基因通常以与已知基因的同源性和某些功能丧失的组合为特征。 也就是说,尽管每个假基因都具有与某些功能基因相似的DNA序列,但它们通常无法产生功能性的最终蛋白质产物。由于同源性和功能丧失的两个要求,假基因有时难以在基因组中鉴定和表征。 通常是通过序列比对而不是生物学上证实的。
Pseudogenes for RNA genes are usually more difficult to discover as they do not need to be translated and thus do not have "reading frames".
Pseudogenes can complicate molecular genetic studies. For example, amplification of a gene by PCR may simultaneously amplify a pseudogene that shares similar sequences. This is known as PCR bias or amplification bias. Similarly, pseudogenes are sometimes annotated as genes in genome sequences.
处理过的假基因经常给基因预测程序带来问题,经常被误认为是真实的基因或外显子。有人提出,识别加工过的假基因可以帮助提高基因预测方法的准确性。[8]
最近,140 个人类假基因被证明可以被翻译。[9]但其蛋白质产物的功能尚不清楚。
根据不同的起源机制和特点,假基因可大致分为如下四类: 經處理的假基因 (Processed)、未經處理的假基因 (Non-processed)、單套假基因 (Unitary pseudogenes)、假的假基因 (Pseudo-pseudogenes)。
Processed (or retrotransposed) pseudogenes. In higher eukaryotes, particularly mammals, retrotransposition is a fairly common event that has had a huge impact on the composition of the genome. For example, somewhere between 30–44% of the human genome consists of repetitive elements such as SINEs and LINEs (see retrotransposons).[10][11] In the process of retrotransposition, a portion of the mRNA or hnRNA transcript of a gene is spontaneously reverse transcribed back into DNA and inserted into chromosomal DNA. Although retrotransposons usually create copies of themselves, it has been shown in an in vitro system that they can create retrotransposed copies of random genes, too.[12] Once these pseudogenes are inserted back into the genome, they usually contain a poly-A tail, and usually have had their introns spliced out; these are both hallmark features of cDNAs. However, because they are derived from an RNA product, processed pseudogenes also lack the upstream promoters of normal genes; thus, they are considered "dead on arrival", becoming non-functional pseudogenes immediately upon the retrotransposition event.[13] However, these insertions occasionally contribute exons to existing genes, usually via alternatively spliced transcripts.[14] A further characteristic of processed pseudogenes is common truncation of the 5' end relative to the parent sequence, which is a result of the relatively non-processive retrotransposition mechanism that creates processed pseudogenes.[15] Processed pseudogenes are continually being created in primates.[16] Human populations, for example, have distinct sets of processed pseudogenes across its individuals.[17]
Non-processed (or duplicated) pseudogenes. Gene duplication is another common and important process in the evolution of genomes. A copy of a functional gene may arise as a result of a gene duplication event caused by homologous recombination at, for example, repetitive sine sequences on misaligned chromosomes and subsequently acquire mutations that cause the copy to lose the original gene's function. Duplicated pseudogenes usually have all the same characteristics as genes, including an intact exon-intron structure and regulatory sequences. The loss of a duplicated gene's functionality usually has little effect on an organism's fitness, since an intact functional copy still exists. According to some evolutionary models, shared duplicated pseudogenes indicate the evolutionary relatedness of humans and the other primates.[18] If pseudogenization is due to gene duplication, it usually occurs in the first few million years after the gene duplication, provided the gene has not been subjected to any selection pressure.[19] Gene duplication generates functional redundancy and it is not normally advantageous to carry two identical genes. Mutations that disrupt either the structure or the function of either of the two genes are not deleterious and will not be removed through the selection process. As a result, the gene that has been mutated gradually becomes a pseudogene and will be either unexpressed or functionless. This kind of evolutionary fate is shown by population genetic modeling[20][21] and also by genome analysis.[19][22] According to evolutionary context, these pseudogenes will either be deleted or become so distinct from the parental genes so that they will no longer be identifiable. Relatively young pseudogenes can be recognized due to their sequence similarity.[23]
Various mutations (such as indels and nonsense mutations) can prevent a gene from being normally transcribed or translated, and thus the gene may become less- or non-functional or "deactivated". These are the same mechanisms by which non-processed genes become pseudogenes, but the difference in this case is that the gene was not duplicated before pseudogenization. Normally, such a pseudogene would be unlikely to become fixed in a population, but various population effects, such as genetic drift, a population bottleneck, or, in some cases, natural selection, can lead to fixation. The classic example of a unitary pseudogene is the gene that presumably coded the enzyme L-gulono-γ-lactone oxidase (GULO) in primates. In all mammals studied besides primates (except guinea pigs), GULO aids in the biosynthesis of ascorbic acid (vitamin C), but it exists as a disabled gene (GULOP) in humans and other primates.[24][25] Another more recent example of a disabled gene links the deactivation of the caspase 12 gene (through a nonsense mutation) to positive selection in humans.[26]
It has been shown that processed pseudogenes accumulate mutations faster than non-processed pseudogenes.[27]
The rapid proliferation of DNA sequencing technologies has led to the identification of many apparent pseudogenes using gene prediction techniques. Pseudogenes are often identified by the appearance of a premature stop codon in a predicted mRNA sequence, which would, in theory, prevent synthesis (translation) of the normal protein product of the original gene. There have been some reports of translational readthrough of such premature stop codons in mammals, as reviewed in the "Translational readthrough" section of the stop codon article. As alluded to in the figure above, a small amount of the protein product of such readthrough may still be recognizable and function at some level. If so, the pseudogene can be subject to natural selection. That appears to have happened during the evolution of Drosophila species, as described next.
In 2016 it was reported that 4 predicted pseudogenes in multiple Drosophila species actually encode proteins with biologically important functions,[28] "suggesting that such 'pseudo-pseudogenes' could represent a widespread phenomenon". For example, the functional protein (an olfactory receptor) is found only in neurons. This finding of tissue-specific biologically-functional genes that could have been dismissed as pseudogenes by in silico analysis complicates the analysis of sequence data. As of 2012, it appeared that there are approximately 12,000–14,000 pseudogenes in the human genome,[29] almost comparable to the oft-cited approximate value of 20,000 genes in our genome. The current work may also help to explain why we are able to live with 20 to 100 putative homozygous loss of function mutations in our genomes.[30]
Through reanalysis of over 50 million peptides generated from the human proteome and separated by mass spectrometry, it now (2016) appears that there are at least 19,262 human proteins produced from 16,271 genes or clusters of genes. From this analysis, 8 new protein coding genes that were previously considered pseudogenes were identified.[31]
细菌基因组中也存在假基因[47]。这些拥有假基因的细菌通常为共生或细胞内寄生,因此它们不需要一些生活在外界复杂环境中的细菌所必须的基因。一个极端的例子是麻风病的病原体--麻风杆菌(Mycobacterium leprae)的基因组,已报道有1,133个假基因约占其转录组的50%[48]。
Seamless Wikipedia browsing. On steroids.
Every time you click a link to Wikipedia, Wiktionary or Wikiquote in your browser's search results, it will show the modern Wikiwand interface.
Wikiwand extension is a five stars, simple, with minimum permission required to keep your browsing private, safe and transparent.