Question: Why Does Cufflinks With Mask (-M) Option Have Lower Fpkm For Mrna Genes?
4
gravatar for Xianjun
5.9 years ago by
Xianjun250
Great Boston Area
Xianjun250 wrote:

Dear all,

I was curious how much the -M (mask file) option can improve the FPKM from Cufflinks. From the mannual, it says

-M/--mask-file <mask.(gtf gff)&gt;="" <br=""/> Tells Cufflinks to ignore all reads that could have come from transcripts in this GTF file. We recommend including any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.

So, I would expect by providing the mask file containing rRNA, tRNA, mt genes etc. will decrease the "total mapped reads" (e.g. denominator), which will lead a increased FPKM. But actually what I see is, for most mRNA genes, the FPKM values with -M option are smaller than that without -M. See attached figures (e.g. I expect most of the dots are under the red dotted line, which is x=y).

I have to admit that -M indeed can reduce a lot of the FPKM for rRNA genes. But still, it's mysterious why most mRNA genes have lower FPKM after applying -M option. Does anyone have similar observation?

btw, here is my cufflinks arguments with -M:

cufflinks --library-type fr-unstranded -o cufflink_w_M -p 8 -G /data/iGenome/Homo_sapiens/UCSC/hg19/Annotation/Genes/gencode.v13.annotation.karotyped.gtf -M /data/iGenome/Homo_sapiens/UCSC/hg19/Annotation/Genes/chrM.rRNA.tRNA.gtf --multi-read-correct accepted_hits.bam

and without -M:

cufflinks --library-type fr-unstranded -o cufflink_wo_M -p 8 -G /data/iGenome/Homo_sapiens/UCSC/hg19/Annotation/Genes/gencode.v13.annotation.karotyped.gtf --multi-read-correct accepted_hits.bam

Thanks

-Xianjun

Cufflinks output with -M vs. without -M

cufflinks • 4.4k views
ADD COMMENTlink modified 3.8 years ago by dina.hesham139120 • written 5.9 years ago by Xianjun250
2

It looks like this might be related to the options --compatible-hits-norm and --total-hits-norm. The documentation isn't super clear, but I'd suggest playing around with these options and seeing what happens.

ADD REPLYlink modified 5.9 years ago • written 5.9 years ago by Chris Cabanski330

Thanks for the clue. I am re-running cufflinks with --compatible-hits-norm option (by default it uses --total-hits-norm). I will update with you for the result. Thanks.

ADD REPLYlink written 5.9 years ago by Xianjun250

Does anyone have clue to the mysterious?

ADD REPLYlink written 5.9 years ago by Xianjun250
0
gravatar for Xianjun
5.9 years ago by
Xianjun250
Great Boston Area
Xianjun250 wrote:

Here is the plot after applying --compatible-hits-norm option. Looks more much what I expected. Thanks for the tip from Chris.

ADD COMMENTlink written 5.9 years ago by Xianjun250
0
gravatar for dina.hesham139
3.8 years ago by
Egypt
dina.hesham139120 wrote:

Hey Xianjun

I was trying to make my mask gtf (hg 19) file but I failed, is there's a chance that you would share the gtf file? 

Also could you write the complete cufflinks command that managed to get you the right results?

Thank you so much!

ADD COMMENTlink written 3.8 years ago by dina.hesham139120
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1872 users visited in the last hour