Question

Very low percentage of assigned reads using featureCounts

2

Entering edit mode

4.8 years ago

Meghdad ▴ 20

Dear all,

I am using featureCounts to count the number of reads per gene in an RNA-seq experiment. The problem is that I get very low number of assigned reads using featureCounts:

Assigned       3178736
Unassigned_Unmapped 0
Unassigned_MappingQuality   0
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping 0
Unassigned_Secondary    0
Unassigned_NonSplit 0
Unassigned_NoFeatures   41994410
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    77571683

I use the following:

featureCounts -M --fraction -g gene_id -T 6 -s 2 -a annotation/hg38.gtf -o counts/mycounts Alignment/*.bam

--The alignment files look fine in IGV. -- Both reference genome and annotation file are from UCSC. -- The library is reversely stranded, as I can also see in IGV.

Thanks in advance for your comments.

RNA-Seq next-gen • 4.8k views

ADD COMMENT • link updated 4.8 years ago by Carlo Yague 8.6k • written 4.8 years ago by Meghdad ▴ 20

0

Entering edit mode

Are you sure your library is reversed stranded?

ADD REPLY • link 4.8 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

Yes, and anyway I tried -s 0 and -s 1 in the featureCounts, did not improve!

ADD REPLY • link 4.8 years ago by Meghdad ▴ 20

0

Entering edit mode

You are counting "multi-mapped" reads. Specific reason for doing that?

Is there a chance you have rRNA (or DNA) contamination in your data?

The alignment files look fine in IGV

What does this mean? That you see appropriate pileup of reads over exons and that there is no general scatter of alignments on the genome? Genes you expect to go up/down are behaving as expected?

ADD REPLY • link 4.8 years ago by GenoMax 141k

0

Entering edit mode

Thanks for your reply. I didn't check for rRNA contamination. Could that be the reason? If I don't count multi-mapped reads, anyway doesn't change much the low percentage of assigned reads. About the IGV part, I meant I see nice coverage on exons, directionality of reads goes with stranded library preparation, and I see differences in reads at genes I expect so.

ADD REPLY • link 4.8 years ago by Meghdad ▴ 20

score 2 · Answer 1 · 2019-07-08

2

Entering edit mode

4.8 years ago

Carlo Yague 8.6k

About the reads Unassigned_NoFeatures

Many reads are mapped on regions that are not found in your annotation. It could be that your annotation is incomplete, or that you have an unusual level of extragenic transcription in your samples. But first, I would check if there are discrepancies between the chromosome name in your alignment files and the chromosome names in your annotation file. You can easily check this using:

samtools idxstats Alignment/myBam.bam | cut -f 1 | sort
cut -f 1 annotation/hg38.gtf | sort | uniq

About the reads Unassigned_Ambiguity

Many reads are mapped to genomic regions that correspond to multiple gene_id in your annotation. Once again, check if your annotation make sense (you don't want the transcripts isoforms of one gene to be annotated under multiple gene_id for instance). Also, using featureCounts -O option allows you to count reads overlapping multiple features. It is not advised to do it (it causes ambiguity), but you can just use the -O option to check on what kind of genes the "ambiguous" reads are mapped.

ADD COMMENT • link 4.8 years ago by Carlo Yague 8.6k

0

Entering edit mode

Thanks for your comment. For the Unassigned_NoFeatures part, I ran your commands, and gave me similar chromosome names (chr*). I also tried to get the annotation file directly from gencode, and redo counting, did not improve much.

For the Unassigned_Ambiguity, I tried featureCounts -O, but it just improved just like from 3% to 5% of successfully assigned ones.

ADD REPLY • link 4.8 years ago by Meghdad ▴ 20

0

Entering edit mode

Are your reads trimmed adapter content and base calling quality?

ADD REPLY • link 4.8 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

Yes, using Cutadapt.

ADD REPLY • link 4.8 years ago by Meghdad ▴ 20

0

Entering edit mode

And did you check for rRNA contamination like genomax recommended? I, for example, always use sortMeRNA. You can sometimes also see this in a shift to the right of the curve in the GC content distribution plot of your FASTQC output.

ADD REPLY • link 4.8 years ago by caggtaagtat ★ 1.9k

0

Entering edit mode

For the Unassigned_Ambiguity, I tried featureCounts -O, but it just improved just like from 3% to 5% of successfully assigned ones.

This is weird. Using the -O option should make the number of Unassigned_Ambiguity reads drop to zero, while the number of Assigned read should, in your case, largely increase. Can you provide the detailed output of featureCounts (number of read assigned, etc) and the exact command you used ?

ADD REPLY • link 4.8 years ago by Carlo Yague 8.6k