Pseudogenes in the human genome annotation
2
1
Entering edit mode
3.4 years ago

Hi everyone,

I was wondering if anyone if familiar of any annotation term in the human genes annotation e.g. from gencode or ensembl to be able to extract pseudogenes and separate them from non-pseudogenes.

Any thoughts?

Thanks in advance, Sergio

human genome GRCh38 hg38 annotation • 2.5k views
ADD COMMENT
2
Entering edit mode

GENCODE contains gene_type which you can query for pseudogene.

ADD REPLY
1
Entering edit mode

Is it ok with an Ensembl gff?

ADD REPLY
2
Entering edit mode
3.4 years ago
Michael 54k

You can use Ensembl BioMart with the following query:

very long BioMart query..... Modify parameters as you like

There are many sub-types of pseudo genes, the query outputs the gene type in the last column.

ADD COMMENT
1
Entering edit mode

Thank you for the link! Do you mind to explain the query a bit more? I do not understand how modifying only the attributes, one gets the list of pseudogenes.

ADD REPLY
0
Entering edit mode

The link will lead you to a preset encompassing all types of pseudogenes by using a Filter setting for "gene type" and selecting all types that contain "pseudogene", like "translated_processed_pseudogene,translated_unprocessed_pseudogene, etc.". This link is meant as a starting point. You can adjust the filter criteria to restrict the results to different subsets of pseudogenes or modify the attributes to extract different data columns or sequences. It is best to simply try it out.

All settings and filters are encoded in the URL and correctly applied by BioMart. However, it seems that there is a bug that prevents the filter settings encoded in the URL to be displayed correctly in the web-interface under "Filters". This behavior wasn't there when I posted this answer. If you check the results, they are correct anyway and contain only *pseudogene.

To change filter settings, click on Filter (to the left) -> check "Gene types" -> and select all gene types that you wish to include

ADD REPLY
1
Entering edit mode
2.5 years ago
Luis Nassar ▴ 650

Hello,

You can also use the Table Browser (http://genome.ucsc.edu/cgi-bin/hgTables) to extract the pseudogenes from our default gene track.

First make the following selections (for hg38):

Table Browser selection

And then select the filter button, and type pseudogene for transcriptClass:

filter for pseudogene

The output for the whole genome will be 18,578 annotations, from the GENCODE V36 models.

If you have any follow up questions, our public help desk can always be reached at genome@soe.ucsc.edu. You may also send questions to genome-www@soe.ucsc.edu if they contain sensitive data. For any Genome Browser questions on Biostars, the UCSC tag is the best way to ensure visibility by the team.

ADD COMMENT

Login before adding your answer.

Traffic: 1431 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6