Question: Counts from cuffdiff VS featurecounts
gravatar for marina.v.yurieva
4.1 years ago by
Farmington, CT
marina.v.yurieva520 wrote:

I tried to get the information about read counts from the BAM files using both cuffdiff (isoforms.count_tracking file) and featurecounts using the same BAM files and gtf file and for several transcripts they show completely different results. Is it normal or something is going wrong?

I tried to run featurecounts with different parameters: with or without duplicates, stranded and non stranded but the result still doesn't match cuffdiff counts.

rna-seq counts cufflinks • 2.4k views
ADD COMMENTlink modified 4.1 years ago by geek_y11k • written 4.1 years ago by marina.v.yurieva520
gravatar for igor
4.1 years ago by
United States
igor11k wrote:

Cuffdiff counts are not actually raw counts, as opposed to featureCounts counts. Check this earlier discussion: How do you extract raw counts from CUFFLINKS?

From Cuffnorm manual (probably applies to Cuffdiff as well):

Cuffnorm will report both FPKM values and normalized, estimates for the number of fragments that originate from each gene, transcript, TSS group, and CDS group. Note that because these counts are already normalized to account for differences in library size, they should not be used with downstream differential expression tools that require raw counts as input.

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by igor11k

Yes, I saw it but can it be so different that one transcript has 0 counts in featurecounts and several thousands in cuffdiff counts? And then what to use cuffdiff counts for? Clearly you cannot input them in edgeR or DESeq right?

ADD REPLYlink written 4.1 years ago by marina.v.yurieva520

Not sure why the difference is so high.

Cuffdiff counts could be used as an alternative to FPKMs, but not normalized to gene length.

ADD REPLYlink written 4.1 years ago by igor11k
gravatar for geek_y
4.1 years ago by
geek_y11k wrote:

Cuffdiff tries to estimates the counts and resolves ambiguity when the reads overlaps different transcripts of the same gene. Featurecounts might simply discard all the reads that have an ambiguous overlaps with different transcripts/genes. If you use the '-O' option in featureCounts, you would see some counts.

Do you have IGV screen shot of the transcript of interest and how the reads map there ? That would clearly show why there are differences. Use 'expanded' section for annotations and 'squished' and 'view as pairs' for the reads if you plan to post the IGV screenshot.

ADD COMMENTlink modified 4.1 years ago • written 4.1 years ago by geek_y11k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1199 users visited in the last hour