Question: Very low percentage of assigned reads using featureCounts
0
gravatar for Meghdad
3 months ago by
Meghdad0
Meghdad0 wrote:

Dear all,

I am using featureCounts to count the number of reads per gene in an RNA-seq experiment. The problem is that I get very low number of assigned reads using featureCounts:

Assigned       3178736
Unassigned_Unmapped 0
Unassigned_MappingQuality   0
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping 0
Unassigned_Secondary    0
Unassigned_NonSplit 0
Unassigned_NoFeatures   41994410
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    77571683

I use the following:

featureCounts -M --fraction -g gene_id -T 6 -s 2 -a annotation/hg38.gtf -o counts/mycounts Alignment/*.bam

--The alignment files look fine in IGV. -- Both reference genome and annotation file are from UCSC. -- The library is reversely stranded, as I can also see in IGV.

Thanks in advance for your comments.

rna-seq next-gen • 296 views
ADD COMMENTlink modified 3 months ago by Carlo Yague4.7k • written 3 months ago by Meghdad0

Are you sure your library is reversed stranded?

ADD REPLYlink written 3 months ago by caggtaagtat800

Yes, and anyway I tried -s 0 and -s 1 in the featureCounts, did not improve!

ADD REPLYlink written 3 months ago by Meghdad0

You are counting "multi-mapped" reads. Specific reason for doing that?

Is there a chance you have rRNA (or DNA) contamination in your data?

The alignment files look fine in IGV

What does this mean? That you see appropriate pileup of reads over exons and that there is no general scatter of alignments on the genome? Genes you expect to go up/down are behaving as expected?

ADD REPLYlink modified 3 months ago • written 3 months ago by genomax73k

Thanks for your reply. I didn't check for rRNA contamination. Could that be the reason? If I don't count multi-mapped reads, anyway doesn't change much the low percentage of assigned reads. About the IGV part, I meant I see nice coverage on exons, directionality of reads goes with stranded library preparation, and I see differences in reads at genes I expect so.

ADD REPLYlink written 3 months ago by Meghdad0
2
gravatar for Carlo Yague
3 months ago by
Carlo Yague4.7k
Belgium
Carlo Yague4.7k wrote:

About the reads Unassigned_NoFeatures

Many reads are mapped on regions that are not found in your annotation. It could be that your annotation is incomplete, or that you have an unusual level of extragenic transcription in your samples. But first, I would check if there are discrepancies between the chromosome name in your alignment files and the chromosome names in your annotation file. You can easily check this using:

samtools idxstats Alignment/myBam.bam | cut -f 1 | sort
cut -f 1 annotation/hg38.gtf | sort | uniq

About the reads Unassigned_Ambiguity

Many reads are mapped to genomic regions that correspond to multiple gene_id in your annotation. Once again, check if your annotation make sense (you don't want the transcripts isoforms of one gene to be annotated under multiple gene_id for instance). Also, using featureCounts -O option allows you to count reads overlapping multiple features. It is not advised to do it (it causes ambiguity), but you can just use the -O option to check on what kind of genes the "ambiguous" reads are mapped.

ADD COMMENTlink modified 3 months ago • written 3 months ago by Carlo Yague4.7k

Thanks for your comment. For the Unassigned_NoFeatures part, I ran your commands, and gave me similar chromosome names (chr*). I also tried to get the annotation file directly from gencode, and redo counting, did not improve much.

For the Unassigned_Ambiguity, I tried featureCounts -O, but it just improved just like from 3% to 5% of successfully assigned ones.

ADD REPLYlink written 3 months ago by Meghdad0

Are your reads trimmed adapter content and base calling quality?

ADD REPLYlink written 3 months ago by caggtaagtat800

Yes, using Cutadapt.

ADD REPLYlink written 3 months ago by Meghdad0

And did you check for rRNA contamination like genomax recommended? I, for example, always use sortMeRNA. You can sometimes also see this in a shift to the right of the curve in the GC content distribution plot of your FASTQC output.

ADD REPLYlink written 3 months ago by caggtaagtat800

For the Unassigned_Ambiguity, I tried featureCounts -O, but it just improved just like from 3% to 5% of successfully assigned ones.

This is weird. Using the -O option should make the number of Unassigned_Ambiguity reads drop to zero, while the number of Assigned read should, in your case, largely increase. Can you provide the detailed output of featureCounts (number of read assigned, etc) and the exact command you used ?

ADD REPLYlink written 3 months ago by Carlo Yague4.7k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 2379 users visited in the last hour