Question

Counts from cuffdiff VS featurecounts

0

Entering edit mode

7.5 years ago

marina.v.yurieva ▴ 570

I tried to get the information about read counts from the BAM files using both cuffdiff (isoforms.count_tracking file) and featurecounts using the same BAM files and gtf file and for several transcripts they show completely different results. Is it normal or something is going wrong?

I tried to run featurecounts with different parameters: with or without duplicates, stranded and non stranded but the result still doesn't match cuffdiff counts.

RNA-Seq cufflinks counts • 3.5k views

ADD COMMENT • link updated 7.5 years ago by GouthamAtla 12k • written 7.5 years ago by marina.v.yurieva ▴ 570

score 1 · Answer 1 · 2016-10-04

1

Entering edit mode

7.5 years ago

igor 13k

Cuffdiff counts are not actually raw counts, as opposed to featureCounts counts. Check this earlier discussion: How do you extract raw counts from CUFFLINKS?

From Cuffnorm manual (probably applies to Cuffdiff as well):

Cuffnorm will report both FPKM values and normalized, estimates for the number of fragments that originate from each gene, transcript, TSS group, and CDS group. Note that because these counts are already normalized to account for differences in library size, they should not be used with downstream differential expression tools that require raw counts as input.

ADD COMMENT • link 7.5 years ago by igor 13k

0

Entering edit mode

Yes, I saw it but can it be so different that one transcript has 0 counts in featurecounts and several thousands in cuffdiff counts? And then what to use cuffdiff counts for? Clearly you cannot input them in edgeR or DESeq right?

ADD REPLY • link 7.5 years ago by marina.v.yurieva ▴ 570

1

Entering edit mode

Not sure why the difference is so high.

Cuffdiff counts could be used as an alternative to FPKMs, but not normalized to gene length.

ADD REPLY • link 7.5 years ago by igor 13k

score 0 · Answer 2 · 2016-10-05

Cuffdiff tries to estimates the counts and resolves ambiguity when the reads overlaps different transcripts of the same gene. Featurecounts might simply discard all the reads that have an ambiguous overlaps with different transcripts/genes. If you use the '-O' option in featureCounts, you would see some counts.

Do you have IGV screen shot of the transcript of interest and how the reads map there ? That would clearly show why there are differences. Use 'expanded' section for annotations and 'squished' and 'view as pairs' for the reads if you plan to post the IGV screenshot.