Question: How To Find Pseudogenes Of A Given Protein?
gravatar for Frenkiboy
7.1 years ago by
Frenkiboy250 wrote:

My question is rather trivial,

Is there a resource which I can use to find pseudogenes of my favorite set of genes using only identifiers (not to have to use sequence comparison)?

Best regards

ADD COMMENTlink modified 7.1 years ago by Emily_Ensembl20k • written 7.1 years ago by Frenkiboy250
gravatar for Emily_Ensembl
7.1 years ago by
Emily_Ensembl20k wrote:

One option may be to use the Ensembl API. You could write a Perl script that searched for your gene within the database, then identified all the transcripts of the gene and selected those with the biotype pseudogene.

There are instructions on downloading the API here:

There's a tutorial on using the API here:

The documentation is here:

Let me know if you need any help with this.

ADD COMMENTlink written 7.1 years ago by Emily_Ensembl20k

While some pseudogenes are transcribed, most are not. I'm not sure where that leaves the Ensembl transcripts. HGNC curate some psuedogenes but not many. You can cluster hypothetical ORFs but as most should have frameshifts, stops or other transcription/translation breakers that wont be easy

ADD REPLYlink written 7.1 years ago by cdsouthan1.8k

This script works:

use strict;
use warnings;
use Bio::EnsEMBL::Registry;

my $registry = "Bio::EnsEMBL::Registry";

   -host => '',
   -user => 'anonymous'

my $gene_adaptor = $registry -> get_adaptor('Mouse', 'Core', 'Gene');

my $gene_name = "Optn";

my @genes = @{ $gene_adaptor->fetch_all_by_external_name($gene_name) };

while (my $gene = shift @genes){

    print $gene->external_name, ", ", $gene->stable_id, "\n";

    my @transcripts = @{ $gene->get_all_Transcripts };

    while (my $transcript = shift @transcripts) {

        if ($transcript->biotype eq "processed_pseudogene"
        or $transcript->biotype eq "IG_C_pseudogene"
        or $transcript->biotype eq "IG_J_pseudogene"
        or $transcript->biotype eq "IG_V_pseudogene"
        or $transcript->biotype eq "polymorphic_pseudogene"
        or $transcript->biotype eq "pseudogene"
        or $transcript->biotype eq "unprocessed_pseudogene"
        or $transcript->biotype eq "TR_J_pseudogene"
        or $transcript->biotype eq "TR_V_pseudogene"
        ) {
        print $transcript->stable_id, "\n";

Edit to put in different genes, change what you print out etc. Also, check the possible biotypes (which, as Khader says, you can find in BioMart) and add any more that you think are relevant to your search.

ADD REPLYlink modified 5 months ago by RamRS27k • written 7.1 years ago by Emily_Ensembl20k
gravatar for Sukhdeep Singh
7.1 years ago by
Sukhdeep Singh10k wrote:

What about

Welcome to The site is developed and maintained by Yale Gerstein Group. This site contains a comprehensive database of identified pseudogenes, utilities used to find pseudogenes, various publication data sets and a pseudogene knowledgebase.

You can download the gene list or per chromosome list in csv/gtf format and then can cross-query with your custom list, using R or perl/python.

ADD COMMENTlink modified 7.1 years ago • written 7.1 years ago by Sukhdeep Singh10k has outdated annotations - corresponding to the May 2004. genome built (unless I couldn't find the recent ones, which is highly plausible), and it requires 2 liftovers to get the coordinates to mm9 - mm10

ADD REPLYlink written 7.1 years ago by Frenkiboy250
gravatar for Khader Shameer
7.1 years ago by
Manhattan, NY
Khader Shameer18k wrote:

15017 / 62252 genes in current release of Ensembl Genes 71/GRCh37.p10 release are annotated with a biotype related to pseudogene.

Gene biotype related to pseudogene: IG_C_pseudogene,IG_J_pseudogene, IG_V_pseudogene, polymorphic_pseudogene, processed_pseudogene, pseudogene, TR_J_pseudogene and TR_V_pseudogene

You can filter the required gene/transcript biotype using gene / protein ID using BioMart easily.

Here is a screenshot based on my query: enter image description here

Answering the revised question - Pseudogenes of OPTN:

Yes you can query BioMart using gene symbols and check if any gene biotype or transcript biotype belongs to a pseudogene category.

For your specific gene OPTN, as per Ensembl Genes 71/GRCh37.p10 release there is no pseudogene encoded by any of its gene/transcripts.

enter image description here

ADD COMMENTlink modified 5 months ago by RamRS27k • written 7.1 years ago by Khader Shameer18k

Dear Khader, thank you so much for your answer, but I am already aware that I can find a list of all of the pseudogenes in a certain genome.

Let me rephrase my question: given a gene (say Optn), is there an easy way to find its related pseudogenes?

ADD REPLYlink written 7.1 years ago by Frenkiboy250
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2073 users visited in the last hour