Question: featureCounts duplicate counting
0
gravatar for lkmklsmn
5.1 years ago by
lkmklsmn890
United States
lkmklsmn890 wrote:

Hi,  

I am working on a set of RNAseq samples. I am using featureCounts from the subread package in order to count fragments (not reads) falling into genomic features. I am using the command line option "--ignoreDup" to exclude duplicate reads. The results seem somewhat strange...  

Here is the summary file provided by featureCounts for two samples:  

Status  sample1.bam
Assigned        12019290
Unassigned_Ambiguity    0
Unassigned_MultiMapping 7794471
Unassigned_NoFeatures   16908358
Unassigned_Unmapped     0
Unassigned_MappingQuality       0
Unassigned_FragementLength      0
Unassigned_Chimera      0
Unassigned_Secondary    0
Unassigned_Nonjunction  0
Unassigned_Duplicate    68111548

 

Status  sample2.bam
Assigned        48247506
Unassigned_Ambiguity    0
Unassigned_MultiMapping 15519394
Unassigned_NoFeatures   67192231
Unassigned_Unmapped     0
Unassigned_MappingQuality       0
Unassigned_FragementLength      0
Unassigned_Chimera      0
Unassigned_Secondary    0
Unassigned_Nonjunction  0
Unassigned_Duplicate    0

All samples were processed in the same way and same commands were used to run featureCounts. Why does one sample have 0 unassigned duplicates and the other 68111548? This difference seems so black and white that I am afraid there is an error somewhere. What exactly does it mean to have unassigned duplicates?

 

ADD COMMENTlink modified 5.1 years ago by dbpzdbpz100 • written 5.1 years ago by lkmklsmn890
1
gravatar for dbpzdbpz
5.1 years ago by
dbpzdbpz100
Australia
dbpzdbpz100 wrote:

When you specified "--ignoreDup", featureCounts will test the FLAGS field in the SAM/BAM file to see if it is a duplicated mapping. In particular, the bit in the FLAGS being tested is 0x400 (1024). This bit is set by the aligner according to its own strategy.

Based on the summarization information you provided, it seems that the two BAM files were mapped by using different settings. Sample1.bam was generated by allowing duplicated mapping but Sample2.bam did not. You may use samtools to extract few records from both BAM files to compare their FLAGS fields (the second TAB-delimitaed field). Sample1.bam should have a lot of records where the FLAGS values are larger than 1024, but Sampl2.bam do not.

ADD COMMENTlink written 5.1 years ago by dbpzdbpz100

You were right. After using the MarkDuplicates function form Picard tools and rerunning featureCounts I was able to "recognize" duplicates in the fragment counting. However, I am not aware of using different settings for the alignment. But it is very weird that some of the samples were flagged and others werent.

Anyways...thanks for your help

ADD REPLYlink written 5.1 years ago by lkmklsmn890
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 987 users visited in the last hour