Question: should non-protein-coding rna(e.g. lncRNA) be removed in RNA-Seq differential expression analysis
gravatar for hellocita
22 months ago by
hellocita20 wrote:

Hi, I was doing RNA-Seq differential expression analysis, I wonder if some non-protein coding genes, such as lnc-rna or the Pseudogene, should be removed before analysis? Since the purpose is to reveal the expression difference of control and observe group, and to relate the difference with some known biological pathway/functions?

More information:The data I used was RNA-seq data (polyA enriched RNA with Illumina HiSeq). I mapped the reads to evidence-based annotation of the human genome (GRCh38) , version 24 (Ensembl 83), download from GENCODE. Finally, I got deferential expressed genes(DEGs). Then I am trying to converted these DEGs from ensemble id to hgnc symbol and search for their biological functions. However I found some of the genes, such as ENSG00000270000, ENSG00000257155, were lnc rna and do not have hgnc symbol. And I found they were not protein coding genes.

I wonder if I have done it wrong:(?

Thanks for your answer

rna-seq • 620 views
ADD COMMENTlink modified 22 months ago by michael.ante3.6k • written 22 months ago by hellocita20
gravatar for michael.ante
22 months ago by
michael.ante3.6k wrote:

Hi Hellocita,

Both genes you mentioned have a A-rich region at the cDNA's 3' site (e.g. ENSG00000257155 / ENST00000548096). Therefore, the polyA fishing / enrichment can result in reads from these transcripts.

I guess you did nothing wrong.

Regarding of keeping these genes in your analysis: you can do both DE-analysis and see how strongly the influence of these genes to the variance/oversdispersion is. These genes seem to be detected due to off-target effects, which may follow different statistical processes than polyadenylated genes.



ADD COMMENTlink written 22 months ago by michael.ante3.6k

Hi Michael, I still do not fully understand why the off-target effect are related to the non-coding RNA i got, since the off-target effect are mostly related to siRNA. Do you mean that these genes are called DEGs because off-target effect during the experiment? Therefore one should use different statistics to double-check them?

ADD REPLYlink modified 22 months ago • written 22 months ago by hellocita20

Hi Hellocita,

I mean that the genes which are not protein coding genes, (especially the two you mentioned in your questions) are off-targets of the polyA-enrichment. Your oligo-dT primer has a certain length and might bind to intrinsic A-rich regions of certain transcripts. The enrichment of these non-target genes might follow a different statistical process than the enrichment of the polyadenylated genes.

Therefore, I'd double check the results of the DE-analysis.



ADD REPLYlink written 22 months ago by michael.ante3.6k

I see, thank you Michael!

ADD REPLYlink written 22 months ago by hellocita20
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1718 users visited in the last hour