Question

how can i detect pseudogenes from my Deseq2 result ?

0

Entering edit mode

4.4 years ago

sabaghianamir70 ▴ 70

Hello

I want to detect pseudogenes from my result list. I got my result from Deseq2, but i can get it from egdeR or limma. is there any tool or any way to extract them all easily ?

RNA-Seq • 1.8k views

ADD COMMENT • link 4.4 years ago by sabaghianamir70 ▴ 70

2

Entering edit mode

I'm not sure where you're going here?

You want to get pseudo genes out of your expression analysis? Or do you have a list of pseudogenes you want to filter out from the results obtained?

In any case being pseudogene is not something you can detect from expression analysis, it's a "structural feature" of genes, not linked to their potential expression.

ADD REPLY • link 4.4 years ago by lieven.sterck 15k

0

Entering edit mode

What do you mean i cant detect in expression analysis.. I have been detecting them like: Pseudogene(gene) GBP1P1(GBP1) and PER4(PER3) in my results, but i want to to extract them all from my analysis.

ADD REPLY • link 4.4 years ago by sabaghianamir70 ▴ 70

2

Entering edit mode

I think there is a misunderstanding here. lieven.sterck thinks that you want to detect pseudogenes (without knowing if a gene is classified as such) simply based on the DEG results (so a p-value, or a certain fold change). I think you already know the classification of your genes and simply want to pull out all genes from your DEG that are classified as such. Is that correct, if not please elaborate by giving a representative output example.

ADD REPLY • link 4.4 years ago by ATpoint 82k

0

Entering edit mode

thx ATpoint .

Indeed, I (falsely?) was under the impression you want to de-novo detect/identify pseudogenes from your DEG results. If that is not the case (== you want to filter) , then something along the lines of what ATpoint indicated here C: how can i detect pseudogenes from my Deseq2 result ? is likely the best way forward.

ADD REPLY • link 4.4 years ago by lieven.sterck 15k

1

Entering edit mode

I assume you have done DEG and now want to extract those genes classified as pseudogene in existing annotations, right? In that case I would get an annotation file (GFF or GTF) for your species, then extract the genes annotated as pseudogene. Then simply overlap those gene names with your DEG list and select those from your list that match. Not aware of any tool to do that but could be done with 'awk' or a few lines in R. If you need more specific help please provide code you've tried and we can try to debug if necessary.

ADD REPLY • link 4.4 years ago by ATpoint 82k

0

Entering edit mode

Thank you, it can help me.

ADD REPLY • link 4.4 years ago by sabaghianamir70 ▴ 70

0

Entering edit mode

For to be sure about this

enter image description here

this is my Deseq2 result. those Gene ID are transcripts which some of them are genes and some of them are pseudogenes. Now i want extract genes which annotated as pseudogenes, So based on this, the solution ATpoint mentioned is correct ?

ADD REPLY • link 4.4 years ago by sabaghianamir70 ▴ 70

0

Entering edit mode

Yes, you could now filter for those annotated as pseudogenes. Still, are you sure you want to do the DEG analysis like this? What I mean is that DESeq2 is supposed for gene level rather than transcript level analysis. Still, you perform differential analysis apparently on transcript counts. You typically summarize your transcript counts to the gene level prior to DEG. Check for example the tximport package which does exactly this. I suggest you read (and potentially exactly follow) the DESeq2 manual and the RNA-seq workflow from its developers (https://www.bioconductor.org/packages/devel/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html).

ADD REPLY • link 4.4 years ago by ATpoint 82k

0

Entering edit mode

Adding on this, there is also the tximeta package (https://bioconductor.org/packages/release/bioc/vignettes/tximeta/inst/doc/tximeta.html) which essentially does what tximport does (so summarizing transcripts to the gene level) but it also pulls metadata for common organisms, such as human. This would (I guess, never used it) provide you right away with the information towards a gene being classified as pseudogene. Check out its manual, might be easier to use this than custom filtering.

ADD REPLY • link 4.4 years ago by ATpoint 82k

0

Entering edit mode

Its because i put my own reference genome in GALAXY. If i using the default reference genome the transcript ids shows as Ensembl IDs. The GALAXY.eu default is based on h19, but i put the h38 version of reference genome, is this wrong ?

ADD REPLY • link 4.4 years ago by sabaghianamir70 ▴ 70