Why Does Cufflinks With Mask (-M) Option Have Lower Fpkm For Mrna Genes?
2
4
Entering edit mode
8.5 years ago
Xianjun ▴ 300

Dear all,

I was curious how much the -M (mask file) option can improve the FPKM from Cufflinks. From the mannual, it says

-M/--mask-file <mask.(gtf gff)&gt;="" <br=""/> Tells Cufflinks to ignore all reads that could have come from transcripts in this GTF file. We recommend including any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.

So, I would expect by providing the mask file containing rRNA, tRNA, mt genes etc. will decrease the "total mapped reads" (e.g. denominator), which will lead a increased FPKM. But actually what I see is, for most mRNA genes, the FPKM values with -M option are smaller than that without -M. See attached figures (e.g. I expect most of the dots are under the red dotted line, which is x=y).

I have to admit that -M indeed can reduce a lot of the FPKM for rRNA genes. But still, it's mysterious why most mRNA genes have lower FPKM after applying -M option. Does anyone have similar observation?

btw, here is my cufflinks arguments with -M:

cufflinks --library-type fr-unstranded -o cufflink_w_M -p 8 -G /data/iGenome/Homo_sapiens/UCSC/hg19/Annotation/Genes/gencode.v13.annotation.karotyped.gtf -M /data/iGenome/Homo_sapiens/UCSC/hg19/Annotation/Genes/chrM.rRNA.tRNA.gtf --multi-read-correct accepted_hits.bam


and without -M:

cufflinks --library-type fr-unstranded -o cufflink_wo_M -p 8 -G /data/iGenome/Homo_sapiens/UCSC/hg19/Annotation/Genes/gencode.v13.annotation.karotyped.gtf --multi-read-correct accepted_hits.bam


Thanks

-Xianjun

2
Entering edit mode

It looks like this might be related to the options --compatible-hits-norm and --total-hits-norm. The documentation isn't super clear, but I'd suggest playing around with these options and seeing what happens.

0
Entering edit mode

Thanks for the clue. I am re-running cufflinks with --compatible-hits-norm option (by default it uses --total-hits-norm). I will update with you for the result. Thanks.

0
Entering edit mode

Does anyone have clue to the mysterious?

0
Entering edit mode
8.5 years ago
Xianjun ▴ 300

Here is the plot after applying --compatible-hits-norm option. Looks more much what I expected. Thanks for the tip from Chris.

0
Entering edit mode
6.4 years ago

Hey Xianjun,

I was trying to make my mask gtf (hg 19) file but I failed, is there's a chance that you would share the gtf file?

Also could you write the complete cufflinks command that managed to get you the right results?

Thank you so much!