featureCounts has low successfully assigned reads
2
0
Entering edit mode
5 weeks ago
tomas4482 ▴ 40

After finishing STAR two-step alignment, I got 62% uniquely mapped reads. But featureCounts gives me only 17% successfully aligned rate.

featureCounts -T 8 -F GTF -p --countReadPairs -t exon -g gene_id -a ~/genome_ref/gencode.v38.annotation.gtf -o ~/expression/all_counts.txt *.bam

I also have a look at other data in this dataset. Many of them are around 20% successfully aligned rate. But QC report is fine. According to the library preparation protocol, the library is unstranded. I also checked the bam file in IGV. The reads distribution seems to be normal. (Although I don't know how to get an overview of the read peaks referring to this answer)

Previously when I analyze another dataset, this rate could be around 70%.

I have 2 questions:

  1. Does low aligned rate severely affect the quantification of gene expression? In another word, can these data be used for downstream analysis?

  2. Why this happens? Any solutions or explanation?

Thank you.

RNA-seq featureCounts • 478 views
ADD COMMENT
0
Entering edit mode

featureCounts should output a summary file. How does that look like?

ADD REPLY
0
Entering edit mode

There are a lot of multi-mapping reads.

For instance, assigned is 39054053 and Unassigned_MultiMapping is 174085844, resulting in 17.29745159% of successfully assigned rate.

All in this dataset have more multi-mapping reads than assigned reads.

ADD REPLY
0
Entering edit mode

Check your annotation file and how featureCounts works when a read overlaps two or more features. It might be that since your reads do not map to unique exons (or uniques gene-id) you have a low assigned rate.

ADD REPLY
0
Entering edit mode
5 weeks ago

Only 20% aligned? RNASeq should work better than that. Are you sure you are aligning to the right thing?

ADD COMMENT
0
Entering edit mode

20% means featureCounts could only map 20% unique but not multiple-mapping or umbiguous reads. It is not the same with STAR alignment rate

ADD REPLY
0
Entering edit mode
5 weeks ago
tomas4482 ▴ 40

I found the reason: overrepresented sequences.

I ran some tests. No rRNA contamination. No adpater duplication.

Hence it should be problem with library construction.

ADD COMMENT

Login before adding your answer.

Traffic: 2625 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6