44% Successfully Assigned Fragments with featureCounts after 85% uniquely mapped reads with STAR
2
0
Entering edit mode
2.1 years ago
garbuzov ▴ 30

Hi there, I'm wondering if anybody can shed some light into what is happening during the count table step with featureCounts. I am loosing more than half of my reads. My mapping statistics seem to be fine when I run STAR.

My library is 75bp paired end using the Nugen Ovation Universal kit. The RNA is from rat. I downloaded the NCBI genome and made the STAR index. Here is my command to run STAR:

STAR --runThreadN 12 \
--genomeDir <path to...>/genomes/rn6/ncbi/star \
--readFilesIn ${R1}${R2} \
--outFileNamePrefix starMapped/\${job_name} \
--outSAMtype BAM Unsorted \
--seedSearchStartLmax 40 \
--outFilterScoreMinOverLread 0.5 \
--outFilterMatchNminOverLread 0.5


My mapping rate is 84-89%. A representative Log.final.out:

 UNIQUE READS:
Uniquely mapped reads % |       85.59%
Average mapped length |       147.92
Number of splices: Total |       23011150
Number of splices: Annotated (sjdb) |       19835290
Number of splices: GT/AG |       22429313
Number of splices: GC/AG |       180851
Number of splices: AT/AC |       22581
Number of splices: Non-canonical |       378405
Mismatch rate per base, % |       0.28%
Deletion rate per base |       0.02%
Deletion average length |       1.98
Insertion rate per base |       0.01%
Insertion average length |       1.66
MULTI-MAPPING READS:
% of reads mapped to multiple loci |       10.03%
% of reads mapped to too many loci |       0.38%
UNMAPPED READS:
% of reads unmapped: too many mismatches |       0.00%
% of reads unmapped: too short |       3.55%
% of reads unmapped: other |       0.45%


Next, I run featureCounts using the following command:

featureCounts -T 12 -p -t exon -g gene_id -a <path to...>/NCBI/Annotation/Genes/genes.gtf -o combined_counts.txt *.bam


My output from featureCounts looks like:

Successfully assigned fragments : 41071240 (44.6%)


And this is representative of one sample in the summary file:

Assigned         41243743
Unassigned_Ambiguity    259701
Unassigned_MultiMapping 30155153
Unassigned_NoFeatures   20857145


My question is, why am I losing so many reads at the step of making the count table? Why are multi-mappers ~10% with STAR and then ~30% with featureCounts?

Thanks!

rna-seq alignment RNA-Seq featureCounts STAR • 2.1k views
ADD COMMENT
2
Entering edit mode
2.1 years ago
h.mon 33k

You are not showing a crucial information which would prove me right (or wrong): the number of input reads. But here is my guess:

The figure STAR is referring as "% of reads mapped to multiple loci" is in relation to the number of input reads. However, the number featureCounts refers as "Unassigned_MultiMapping" is in relation to number of mapped reads. If you have 10% of input reads that are multimappers, but each maps to 4 locations, based on featureCounts output you would think you have 30% multimappers.

P.S.: did you check if the Nugen Ovation you are using really results in an unstranded library? Because the featureCounts command you are issuing is considering your reads as belonging to an unstranded library.

ADD COMMENT
0
Entering edit mode
2.1 years ago
garbuzov ▴ 30

Ok, I think I understand what you're saying. I was so busy comparing percentages I didn't look at read counts.

So, for STAR I get:

                  Number of input reads |       73019489
UNIQUE READS:
Uniquely mapped reads number |       62360589


For featureCounts my # of assigned reads is:

 Assigned         41243743


And the total input is: ~100k fragments, so yes, the huge drop in percentage makes sense now. But I still have a substantial drop in the number of unique fragments from STAR to featureCounts: 62,360,589 -> 41,243,743. What could explain that? Thanks,

PS: And yes, my library is unstranded. I played around with the options. Adding -s 1 drops the assigned read count to 1%.

ADD COMMENT
0
Entering edit mode

Although I can't give you hard numbers, it is not uncommon to have a substantial drop between mapping rate and assignment to feature rate. It depends on several factors, and someone may chime in with more suggestions, but how good is the Rattus norvegicus annotation? In general, I consider human and mouse annotations to be of very high quality, with all other annotations being average at best - I am not familiar with the R. norvegicus annotation, though.

PS: And yes, my library is unstranded. I played around with the options. Adding -s 1 drops the assigned read count to 1%.

Then try with -s 2, because I don't think your library is unstranded. If your library is truly unstranded, an assigned rate of 1% is not realistic: one you expect half of the reads would map to each strand, thus half of the reads should have been assigned. This looks like a "reverse stranded" library incorrectly assigned as "forward stranded".

ADD REPLY

Login before adding your answer.

Traffic: 1761 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6