Question

How To Predict Pseudogenes In A Genome

18

Entering edit mode

12.9 years ago

kun ▴ 180

I need to predict pseudogenes in an eukaryotic species, I have the genome sequences. The gene prediction have been done already.

I have learned that there are two programs named Pseudopipe (paper) and PPFINDER. However, the Pseudopipe need some information from Ensemble mysql database. The question is that I have not use Ensemble pipeline to predict genes. I have problems in use the PPFINDER too, which need synteny information with another species.

I want to that if there exists any convenient way to predict pseudogene. Thank you!

genome gene • 15k views

ADD COMMENT • link updated 16 months ago by Ram 43k • written 12.9 years ago by kun ▴ 180

score 17 · Answer 1 · 2011-05-25

Hi,

I have never used any of these programs but some simple methods can help you predict some pseudogenes. Pseudogenes emerge from protein-coding genes. Then you can scan your genome for sequences homologous to the known protein-coding genes it contains (basically looking for non annotated paralogous sequences). This can be done using tools such as BLAST, BLAT,...

Once this is done you have to see if these sequences have kept their protein-coding abilities and, typically, look for frameshifts or ORF disruptions which can be due to mutations of start/stop codons or indels. A sequence homologous to a protein-coding gene but without ORF is likely to be a pseudogene.

You can then extend your analysis including genes from other species. A pseudogene can be a single copy gene (without paralogs) that has been lost in a given species. The method is a same except that you look at orthologous rather than paralogous sequences and then scan your genome of interest for potential sequences homologous to protein coding genes present in more or less closely related species. An addition to this method is to look for synteny. If the sequence is quite degenerated but the flanking regions (ideally containing several protein-coding genes) are conserved this will give more confidence to you detection.

A good confirmation of pseudogenes is to look at dN/dS of this sequences. Pseudogenes generally evolve under neutral selection and then display a dN/dS close to 1. Nonetheless this might be affected by the time this gene has been decaying.

I let you a reference of a paper which use a partially similar method (it is a bit more complex). It is focusing on only one gene but the approach is usable at a large scale.

I hope it has been helpful.

score 5 · Answer 2 · 2011-05-26

5

Entering edit mode

12.9 years ago

Simon ▴ 50

In addition to what Philippe said, pseudogenes often lack introns. Keep that in mind when pairing the pseudogene to its parent gene.

ADD COMMENT • link 12.9 years ago by Simon ▴ 50

1

Entering edit mode

This is true for processed (retrotransposed) pseudogenes which are mainly intronless but, to my knowledge, exon-intron structures are conserved when the duplication occured by segmental duplications even though the pressure to conserve them is less strong. If you have any reference to share about this I would be interested in reading it. And, also, retrotransposition does occur in all organisms, this might then depend on which species you are working on.

ADD REPLY • link 12.9 years ago by Philippe ★ 1.9k

0

Entering edit mode

Thank You! Perhaps processed pseudogenes make up most of pseudogens in mammals?

ADD REPLY • link 12.9 years ago by kun ▴ 180

score 1 · Answer 3 · 2011-12-25

1

Entering edit mode

12.4 years ago

Ensie ▴ 10

hi

i have 6 pseudogene from my defined gene but one of these pseudogenes is marvelous and i want to know how this pseudogene is produced? would you guide me about this please? thank you in advance

ADD COMMENT • link 12.4 years ago by Ensie ▴ 10

0

Entering edit mode

How do this pseudogene and the corresponding real gene look like? Most often, inaccurate annotation can cause this phenomenon...

ADD REPLY • link 12.4 years ago by Yumtaoist ▴ 70

Ram · Answer 4 · 2015-06-07

1

Entering edit mode

8.9 years ago

Kumar ▴ 170

Hi,

tBLASTn can help to predict pseudogenes with specific parameters e-value cut off 1e-30 and query coverage qcov_hsp_perc 100%.

Try this.

Thanks

Manoj

ADD COMMENT • link updated 16 months ago by Ram 43k • written 8.9 years ago by Kumar ▴ 170

score 0 · Answer 5 · 2015-06-07

0

Entering edit mode

8.9 years ago

cyril-cros ▴ 950

Also, looking at RNASeq data might be helpful. You can find out if a gene is transcribed or not - at least in the tissue of your sample!!! It is also useful since you can find if the annotation you are using is accurate or not.

ADD COMMENT • link 8.9 years ago by cyril-cros ▴ 950