Question

featureCounts - Low Assigned rate - Locations of reads

0

Entering edit mode

3.2 years ago

chrys ▴ 60

Well hello there,

I am using featureCounts from the subread package to count some third generation reads produced by Nanopore sequencing (MinION) and mapped to a reference genome. While we had overall high basecall quality for our reads and the mapping rates were also very nice (94%) featureCount only produced assignment rates in the 50% to 60%.

The largest group there is "NoFeatures" which made me wonder where those reads mapped.

Assigned    1057725  
Unassigned_Unmapped 62207  
Unassigned_Read_Type    0  
Unassigned_Singleton    0  
Unassigned_MappingQuality   0  
Unassigned_Chimera  0  
Unassigned_FragmentLength   0  
Unassigned_Duplicate    0  
Unassigned_MultiMapping 0  
Unassigned_Secondary    0  
Unassigned_NonSplit 0  
Unassigned_NoFeatures   457608  
Unassigned_Overlapping_Length   0  
Unassigned_Ambiguity    283748

I used a custom annotation gff (Gencode + Custom features) to count the mappings. I was wondering if somebody knew a tool or straight forward way (other then checking IGV visually), where those reads are.

Especially if we possibly have some kind of contamination by genomic DNA.

Any suggestions for QC / Tools / procedures are welcome. Thanks !

RNA-Seq featureCount QC • 2.2k views

ADD COMMENT • link updated 3.1 years ago by Biostar 20 • written 3.2 years ago by chrys ▴ 60

2

Entering edit mode

If it is mapped but not overlapping the GTF then it is introns or intergenic. You can make a custom SAF file for featureCounts (see manual) to count the reads for these features. Intergenic is the complement of the entire genome with the GTF entries of type="gene" and intron is the entire genome minus intergenic and exon.

ADD REPLY • link 3.2 years ago by ATpoint 82k

2

Entering edit mode

You can also use the qualimap rnaseq tool to count the number/percentage of exonic, intergenic or intronic regions: http://qualimap.conesalab.org/doc_html/analysis.html#rna-seq-qc.

I believe that you only need the bam and the GTF files (if I remember it well). Although you've a GFF file, you could convert this to GTF by using gffread: https://github.com/gpertea/gffread

ADD REPLY • link 3.2 years ago by antonioggsousa 3.2k

0

Entering edit mode

Thanks to you both !

Qualimap was an excellent suggestions. Exactly what I am looking for. GFF to GTF conversion should be also no problem.

I found it puzzling that with ultra-long reads one would get so many unassigned counts. Thank you.

ADD REPLY • link 3.2 years ago by chrys ▴ 60