Question: Is it necessery to use GTF annotation exlusively for mRNA if sequencing library is mRNA enriched?
gravatar for AlexAbdulkaderKheirallah
5.6 years ago by
United Kingdom
AlexAbdulkaderKheirallah110 wrote:

Hello All,

I have used TUXEDO pipeline for RNAseq differential gene expression analysis. For that purpose I used cuff-merge-generated annotation (reference based transcriptome assembly) which contains both coding and non-coding genes (~60000 genes). However my prep is polyA enriched (mRNA selection). My judgments are based on multiple comparisons (FRD) corrected P-values (i.e. Q-values). The thing that concerns me a little bit is that number of comparisons (all genes) is far grater than the proportion that is in my library (mRNA only) which I thought may unnecessarily skew my Q-values? I was considering using a GTF with only mRNA annotation. Is this a valid concern? I know that cuffdiff doesn't do differential analysis for regions with no coverage (in both conditions), when it states 'NO TEST'. Are these taken away from FDR calculation algorithm? In other words is the FDR correction exclusively based on the number of comparisons that were performed?


rna-seq • 1.5k views
ADD COMMENTlink modified 5.6 years ago by Devon Ryan98k • written 5.6 years ago by AlexAbdulkaderKheirallah110
gravatar for Devon Ryan
5.6 years ago by
Devon Ryan98k
Freiburg, Germany
Devon Ryan98k wrote:

If there's no test performed, then it's as if the gene/feature wasn't ever there for the purposes of multiple comparisons. In general, I wouldn't worry about a few extra genes being tested that you don't care about. In general, if you leave an expressed gene out of your GTF file, cufflinks will just assemble it, so you're just making things slower with no real benefit.

ADD COMMENTlink written 5.6 years ago by Devon Ryan98k

I didn't mean repeating transcriptome assembly on simplified GTF - only wanted to know if NO TEST genes were considered in multiple comparisons...

ADD REPLYlink written 5.6 years ago by AlexAbdulkaderKheirallah110

If that is the case, you have two simple way to check

1. Use the provided p-value and perform the multiple correction yourself (p.adjust in R can do the job for you)

2. Compare self calculated p-value and the one from cufflink. 

ADD REPLYlink written 5.6 years ago by Sam3.3k

p.adjust looks rather handy. Thanks!

ADD REPLYlink written 5.6 years ago by AlexAbdulkaderKheirallah110
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2309 users visited in the last hour