DE Analysis of Protein Coding Genes
1
0
Entering edit mode
4.5 years ago
Pappu ★ 2.1k

I am wondering if it is a standard practice to only include ~20k protein coding genes for DE and subsequent pathway analysis?

RNA-Seq • 827 views
ADD COMMENT
1
Entering edit mode
4.5 years ago
Emily 23k

What species are you working with? Many commonly-studied species have only ~20k protein coding genes.

ADD COMMENT
0
Entering edit mode

Human gencode outputs ~60k genes.

ADD REPLY
1
Entering edit mode

Those ~60,000 will include protein coding, non-coding RNAs (ncRNAs), pseudogenes, and other obscure transcripts, as you already know.

Anyway, it is and it is not standard practice to just focus on the protein coding genes. One reason that we do it is because the protein coding genes are more annotated and there is more literature on these. So, practically, it is just easier to interpret the results when focusing on protein coding genes.

As an example: as you will see by my profile, I work with many different groups. I always ask people whether they want to focus on protein coding and/or ncRNAs. Some may say 'yes', that they would be excited to see the ncRNA results; however, when I send back the results, they (and I) are at a loss as to how to interpret them.

In the past, I have seen people use ncRNAs for, e.g., building networks and doing very focused analyses. For example, one guy at Imperial College London had found evidence of a novel ncRNA that only appeared in ER+ breast cancer, and began exploring that specific ncRNA further. It basically became detectable because this locus in question had elevated expression in cancer, i.e., a level of expression higher than that seen in normal tissue. So, it may just have been as a result of 'transcriptional noise'.

ADD REPLY

Login before adding your answer.

Traffic: 2630 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6