Very low percentage of assigned reads using featureCounts
1
2
Entering edit mode
23 months ago

Dear all,

I am using featureCounts to count the number of reads per gene in an RNA-seq experiment. The problem is that I get very low number of assigned reads using featureCounts:

Assigned       3178736
Unassigned_Unmapped 0
Unassigned_MappingQuality   0
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping 0
Unassigned_Secondary    0
Unassigned_NonSplit 0
Unassigned_NoFeatures   41994410
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    77571683


I use the following:

featureCounts -M --fraction -g gene_id -T 6 -s 2 -a annotation/hg38.gtf -o counts/mycounts Alignment/*.bam


--The alignment files look fine in IGV. -- Both reference genome and annotation file are from UCSC. -- The library is reversely stranded, as I can also see in IGV.

RNA-Seq next-gen • 1.7k views
0
Entering edit mode

Are you sure your library is reversed stranded?

0
Entering edit mode

Yes, and anyway I tried -s 0 and -s 1 in the featureCounts, did not improve!

0
Entering edit mode

You are counting "multi-mapped" reads. Specific reason for doing that?

Is there a chance you have rRNA (or DNA) contamination in your data?

The alignment files look fine in IGV

What does this mean? That you see appropriate pileup of reads over exons and that there is no general scatter of alignments on the genome? Genes you expect to go up/down are behaving as expected?

0
Entering edit mode

Thanks for your reply. I didn't check for rRNA contamination. Could that be the reason? If I don't count multi-mapped reads, anyway doesn't change much the low percentage of assigned reads. About the IGV part, I meant I see nice coverage on exons, directionality of reads goes with stranded library preparation, and I see differences in reads at genes I expect so.

2
Entering edit mode
23 months ago

Many reads are mapped on regions that are not found in your annotation. It could be that your annotation is incomplete, or that you have an unusual level of extragenic transcription in your samples. But first, I would check if there are discrepancies between the chromosome name in your alignment files and the chromosome names in your annotation file. You can easily check this using:

samtools idxstats Alignment/myBam.bam | cut -f 1 | sort
cut -f 1 annotation/hg38.gtf | sort | uniq


Many reads are mapped to genomic regions that correspond to multiple gene_id in your annotation. Once again, check if your annotation make sense (you don't want the transcripts isoforms of one gene to be annotated under multiple gene_id for instance). Also, using featureCounts -O option allows you to count reads overlapping multiple features. It is not advised to do it (it causes ambiguity), but you can just use the -O option to check on what kind of genes the "ambiguous" reads are mapped.

0
Entering edit mode

Thanks for your comment. For the Unassigned_NoFeatures part, I ran your commands, and gave me similar chromosome names (chr*). I also tried to get the annotation file directly from gencode, and redo counting, did not improve much.

For the Unassigned_Ambiguity, I tried featureCounts -O, but it just improved just like from 3% to 5% of successfully assigned ones.

0
Entering edit mode

0
Entering edit mode

0
Entering edit mode

And did you check for rRNA contamination like genomax recommended? I, for example, always use sortMeRNA. You can sometimes also see this in a shift to the right of the curve in the GC content distribution plot of your FASTQC output.

0
Entering edit mode

For the Unassigned_Ambiguity, I tried featureCounts -O, but it just improved just like from 3% to 5% of successfully assigned ones.

This is weird. Using the -O option should make the number of Unassigned_Ambiguity reads drop to zero, while the number of Assigned read should, in your case, largely increase. Can you provide the detailed output of featureCounts (number of read assigned, etc) and the exact command you used ?