Question: Very low percentage of assigned reads using featureCounts
2
gravatar for Meghdad
12 months ago by
Meghdad20
Meghdad20 wrote:

Dear all,

I am using featureCounts to count the number of reads per gene in an RNA-seq experiment. The problem is that I get very low number of assigned reads using featureCounts:

Assigned       3178736
Unassigned_Unmapped 0
Unassigned_MappingQuality   0
Unassigned_Chimera  0
Unassigned_FragmentLength   0
Unassigned_Duplicate    0
Unassigned_MultiMapping 0
Unassigned_Secondary    0
Unassigned_NonSplit 0
Unassigned_NoFeatures   41994410
Unassigned_Overlapping_Length   0
Unassigned_Ambiguity    77571683

I use the following:

featureCounts -M --fraction -g gene_id -T 6 -s 2 -a annotation/hg38.gtf -o counts/mycounts Alignment/*.bam

--The alignment files look fine in IGV. -- Both reference genome and annotation file are from UCSC. -- The library is reversely stranded, as I can also see in IGV.

Thanks in advance for your comments.

rna-seq next-gen • 887 views
ADD COMMENTlink modified 12 months ago by Carlo Yague5.0k • written 12 months ago by Meghdad20

Are you sure your library is reversed stranded?

ADD REPLYlink written 12 months ago by caggtaagtat1.0k

Yes, and anyway I tried -s 0 and -s 1 in the featureCounts, did not improve!

ADD REPLYlink written 12 months ago by Meghdad20

You are counting "multi-mapped" reads. Specific reason for doing that?

Is there a chance you have rRNA (or DNA) contamination in your data?

The alignment files look fine in IGV

What does this mean? That you see appropriate pileup of reads over exons and that there is no general scatter of alignments on the genome? Genes you expect to go up/down are behaving as expected?

ADD REPLYlink modified 12 months ago • written 12 months ago by genomax85k

Thanks for your reply. I didn't check for rRNA contamination. Could that be the reason? If I don't count multi-mapped reads, anyway doesn't change much the low percentage of assigned reads. About the IGV part, I meant I see nice coverage on exons, directionality of reads goes with stranded library preparation, and I see differences in reads at genes I expect so.

ADD REPLYlink written 12 months ago by Meghdad20
2
gravatar for Carlo Yague
12 months ago by
Carlo Yague5.0k
Canada
Carlo Yague5.0k wrote:

About the reads Unassigned_NoFeatures

Many reads are mapped on regions that are not found in your annotation. It could be that your annotation is incomplete, or that you have an unusual level of extragenic transcription in your samples. But first, I would check if there are discrepancies between the chromosome name in your alignment files and the chromosome names in your annotation file. You can easily check this using:

samtools idxstats Alignment/myBam.bam | cut -f 1 | sort
cut -f 1 annotation/hg38.gtf | sort | uniq

About the reads Unassigned_Ambiguity

Many reads are mapped to genomic regions that correspond to multiple gene_id in your annotation. Once again, check if your annotation make sense (you don't want the transcripts isoforms of one gene to be annotated under multiple gene_id for instance). Also, using featureCounts -O option allows you to count reads overlapping multiple features. It is not advised to do it (it causes ambiguity), but you can just use the -O option to check on what kind of genes the "ambiguous" reads are mapped.

ADD COMMENTlink modified 12 months ago • written 12 months ago by Carlo Yague5.0k

Thanks for your comment. For the Unassigned_NoFeatures part, I ran your commands, and gave me similar chromosome names (chr*). I also tried to get the annotation file directly from gencode, and redo counting, did not improve much.

For the Unassigned_Ambiguity, I tried featureCounts -O, but it just improved just like from 3% to 5% of successfully assigned ones.

ADD REPLYlink written 12 months ago by Meghdad20

Are your reads trimmed adapter content and base calling quality?

ADD REPLYlink written 12 months ago by caggtaagtat1.0k

Yes, using Cutadapt.

ADD REPLYlink written 12 months ago by Meghdad20

And did you check for rRNA contamination like genomax recommended? I, for example, always use sortMeRNA. You can sometimes also see this in a shift to the right of the curve in the GC content distribution plot of your FASTQC output.

ADD REPLYlink written 12 months ago by caggtaagtat1.0k

For the Unassigned_Ambiguity, I tried featureCounts -O, but it just improved just like from 3% to 5% of successfully assigned ones.

This is weird. Using the -O option should make the number of Unassigned_Ambiguity reads drop to zero, while the number of Assigned read should, in your case, largely increase. Can you provide the detailed output of featureCounts (number of read assigned, etc) and the exact command you used ?

ADD REPLYlink written 12 months ago by Carlo Yague5.0k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.3.0
Traffic: 1155 users visited in the last hour