featureCounts - Low Assigned rate - Locations of reads
Entering edit mode
16 months ago
chrys ▴ 60

Well hello there,

I am using featureCounts from the subread package to count some third generation reads produced by Nanopore sequencing (MinION) and mapped to a reference genome. While we had overall high basecall quality for our reads and the mapping rates were also very nice (94%) featureCount only produced assignment rates in the 50% to 60%.

The largest group there is "NoFeatures" which made me wonder where those reads mapped.

Assigned    1057725  
Unassigned_Unmapped 62207  
Unassigned_Read_Type    0  
Unassigned_Singleton    0  
Unassigned_MappingQuality   0  
Unassigned_Chimera  0  
Unassigned_FragmentLength   0  
Unassigned_Duplicate    0  
Unassigned_MultiMapping 0  
Unassigned_Secondary    0  
Unassigned_NonSplit 0  
Unassigned_NoFeatures   457608  
Unassigned_Overlapping_Length   0  
Unassigned_Ambiguity    283748 

I used a custom annotation gff (Gencode + Custom features) to count the mappings. I was wondering if somebody knew a tool or straight forward way (other then checking IGV visually), where those reads are.

Especially if we possibly have some kind of contamination by genomic DNA.

Any suggestions for QC / Tools / procedures are welcome. Thanks !

RNA-Seq featureCount QC • 952 views
Entering edit mode

If it is mapped but not overlapping the GTF then it is introns or intergenic. You can make a custom SAF file for featureCounts (see manual) to count the reads for these features. Intergenic is the complement of the entire genome with the GTF entries of type="gene" and intron is the entire genome minus intergenic and exon.

Entering edit mode

You can also use the qualimap rnaseq tool to count the number/percentage of exonic, intergenic or intronic regions: http://qualimap.conesalab.org/doc_html/analysis.html#rna-seq-qc.

I believe that you only need the bam and the GTF files (if I remember it well). Although you've a GFF file, you could convert this to GTF by using gffread: https://github.com/gpertea/gffread

Entering edit mode

Thanks to you both !

Qualimap was an excellent suggestions. Exactly what I am looking for. GFF to GTF conversion should be also no problem.

I found it puzzling that with ultra-long reads one would get so many unassigned counts. Thank you.


Login before adding your answer.

Traffic: 1793 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6