How To Predict Pseudogenes In A Genome
5
17
Entering edit mode
11.2 years ago
kun ▴ 170

I need to predict pseudogenes in an eukaryotic species, I have the genome sequences. The gene prediction have been done already.

I have learned that there are two programs named Pseudopipe (paper) and PPFINDER. However, the Pseudopipe need some information from Ensemble mysql database. The question is that I have not use Ensemble pipeline to predict genes. I have problems in use the PPFINDER too, which need synteny information with another species.

I want to that if there exists any convenient way to predict pseudogene. Thank you!

genome gene • 14k views
ADD COMMENT
14
Entering edit mode
11.2 years ago
Philippe ★ 1.9k

Hi,

I have never used any of these programs but some simple methods can help you predict some pseudogenes. Pseudogenes emerge from protein-coding genes. Then you can scan your genome for sequences homologous to the known protein-coding genes it contains (basically looking for non annotated paralogous sequences). This can be done using tools such as BLAST, BLAT,...

Once this is done you have to see if these sequences have kept their protein-coding abilities and, typically, look for frameshifts or ORF disruptions which can be due to mutations of start/stop codons or indels. A sequence homologous to a protein-coding gene but without ORF is likely to be a pseudogene.

You can then extend your analysis including genes from other species. A pseudogene can be a single copy gene (without paralogs) that has been lost in a given species. The method is a same except that you look at orthologous rather than paralogous sequences and then scan your genome of interest for potential sequences homologous to protein coding genes present in more or less closely related species. An addition to this method is to look for synteny. If the sequence is quite degenerated but the flanking regions (ideally containing several protein-coding genes) are conserved this will give more confidence to you detection.

A good confirmation of pseudogenes is to look at dN/dS of this sequences. Pseudogenes generally evolve under neutral selection and then display a dN/dS close to 1. Nonetheless this might be affected by the time this gene has been decaying.

I let you a reference of a paper which use a partially similar method (it is a bit more complex). It is focusing on only one gene but the approach is usable at a large scale.

I hope it has been helpful.

ADD COMMENT
0
Entering edit mode

Thank You very much! I understand what I should do now.

ADD REPLY
0
Entering edit mode

+1. A comment is I found building a nucleotide phylogenetic tree is quite informative. Pseudogenes tend to break molecular clocks and stay on long branches. Tree-based method seems to have higher power than dN/dS.

ADD REPLY
0
Entering edit mode

There is no such thing a "neutral selection". I think what you mean to say is "Pseudogenes generally evolve under no selective constraint"

ADD REPLY
0
Entering edit mode

There is no such thing as "neutral selection". I think what you mean to say is "Pseudogenes generally evolve under no selective constraint"

ADD REPLY
4
Entering edit mode
11.2 years ago
Simon ▴ 40

In addition to what Philippe said, pseudogenes often lack introns. Keep that in mind when pairing the pseudogene to its parent gene.

ADD COMMENT
1
Entering edit mode

This is true for processed (retrotransposed) pseudogenes which are mainly intronless but, to my knowledge, exon-intron structures are conserved when the duplication occured by segmental duplications even though the pressure to conserve them is less strong. If you have any reference to share about this I would be interested in reading it. And, also, retrotransposition does occur in all organisms, this might then depend on which species you are working on.

ADD REPLY
0
Entering edit mode

Thank You! Perhaps processed pseudogenes make up most of pseudogens in mammals?

ADD REPLY
0
Entering edit mode
10.6 years ago
Ensie • 0

hi

i have 6 pseudogene from my defined gene but one of these pseudogenes is marvelous and i want to know how this pseudogene is produced? would you guide me about this please? thank you in advance

ADD COMMENT
0
Entering edit mode

How do this pseudogene and the corresponding real gene look like? Most often, inaccurate annotation can cause this phenomenon...

ADD REPLY
0
Entering edit mode
7.2 years ago
Kumar ▴ 130

Hi,

tBLASTn can help to predict pseudogenes with specific parameters e-value cut off 1e-30 and query coverage qcov_hsp_perc 100%. 

Try this..

Thanks.

Manoj

ADD COMMENT
0
Entering edit mode
7.2 years ago
cyril-cros ▴ 920

Also, looking at RNASeq data might be helpful. You can find out if a gene is transcribed or not - at least in the tissue of your sample!!! It is also useful since you can find if the annotation you are using is accurate or not.

ADD COMMENT

Login before adding your answer.

Traffic: 722 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6