Question

Is it necessery to use GTF annotation exlusively for mRNA if sequencing library is mRNA enriched?

0

Entering edit mode

10.1 years ago

AlexAbdulkaderKheirallah ▴ 120

Hello All,

I have used TUXEDO pipeline for RNAseq differential gene expression analysis. For that purpose I used cuff-merge-generated annotation (reference based transcriptome assembly) which contains both coding and non-coding genes (~60000 genes). However my prep is polyA enriched (mRNA selection). My judgments are based on multiple comparisons (FRD) corrected P-values (i.e. Q-values). The thing that concerns me a little bit is that number of comparisons (all genes) is far grater than the proportion that is in my library (mRNA only) which I thought may unnecessarily skew my Q-values? I was considering using a GTF with only mRNA annotation. Is this a valid concern? I know that cuffdiff doesn't do differential analysis for regions with no coverage (in both conditions), when it states 'NO TEST'. Are these taken away from FDR calculation algorithm? In other words is the FDR correction exclusively based on the number of comparisons that were performed?

Thanks

RNA-Seq • 2.4k views

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 10.1 years ago by AlexAbdulkaderKheirallah ▴ 120

Ram · Answer 1 · 2015-06-21

1

Entering edit mode

10.1 years ago

Devon Ryan 105k

If there's no test performed, then it's as if the gene/feature wasn't ever there for the purposes of multiple comparisons. In general, I wouldn't worry about a few extra genes being tested that you don't care about. In general, if you leave an expressed gene out of your GTF file, cufflinks will just assemble it, so you're just making things slower with no real benefit.

ADD COMMENT • link updated 2.6 years ago by Ram 45k • written 10.1 years ago by Devon Ryan 105k

0

Entering edit mode

I didn't mean repeating transcriptome assembly on simplified GTF - only wanted to know if NO TEST genes were considered in multiple comparisons...

ADD REPLY • link updated 2.6 years ago by Ram 45k • written 10.1 years ago by AlexAbdulkaderKheirallah ▴ 120

1

Entering edit mode

If that is the case, you have two simple way to check

Use the provided p-value and perform the multiple correction yourself (p.adjust in R can do the job for you)
Compare self calculated p-value and the one from cufflink.

ADD REPLY • link updated 2.6 years ago by Ram 45k • written 10.1 years ago by Sam ★ 4.8k

0

Entering edit mode

p.adjust looks rather handy. Thanks!

ADD REPLY • link updated 2.6 years ago by Ram 45k • written 10.1 years ago by AlexAbdulkaderKheirallah ▴ 120